This is stars and forks stats for /Bruce-Lee-LY/cuda_hgemm repository. As of 29 Apr, 2024 this repository has 55 stars and 15 forks.
CUDA HGEMM Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction. The calculation expression is as follows, where the precision of matrix A (M * K), B (K * N) and C (M * N) is FP16. Through exploring various matrix tiling and optimization methods, the current performance between 256 to 16384 dimensions is not less than 95% of the performance of cublas, and in many scenarios, it exceeds the performance of cublas. C...
CUDA HGEMM Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction. The calculation expression is as follows, where the precision of matrix A (M * K), B (K * N) and C (M * N) is FP16. Through exploring various matrix tiling and optimization methods, the current performance between 256 to 16384 dimensions is not less than 95% of the performance of cublas, and in many scenarios, it exceeds the performance of cublas. C...
repo | techs | stars | weekly | forks | weekly |
---|---|---|---|---|---|
hai046/JNIFrame | DC++Java | 0 | 0 | 1 | 0 |
ardanlabs/gotour | GoCSSHTML | 62 | 0 | 36 | 0 |
Questra-Digital/ts-micro-app | HCLTypeScriptCSS | 0 | 0 | 30 | 0 |
AndrewGuenther/fck-nat | HCLShellMakefile | 535 | +8 | 22 | +1 |
lakesoul-io/LakeSoul | JavaScalaRust | 1.7k | 0 | 364 | 0 |
BrowserBox/BrowserBox | JavaScriptCSSHTML | 2.8k | +10 | 269 | +3 |
Significant-Gravitas/AutoGPT | JavaScriptPythonJupyter Notebook | 150.4k | 0 | 33.9k | 0 |
baggiponte/pymi-polars | JustCSS | 1 | 0 | 0 | 0 |
pcafrica/advanced_programming_2023-2024 | C++ShellCSS | 16 | 0 | 0 | 0 |
Tosainu/tosainu.github.com | MDXAstroTypeScript | 0 | 0 | 0 | 0 |