Bruce-Lee-LY/cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

CudaCC++ShellPythonCMakegpucudacublasnvidiagemmmatrix-multiplytensor-corehgemm
This is stars and forks stats for /Bruce-Lee-LY/cuda_hgemm repository. As of 29 Apr, 2024 this repository has 55 stars and 15 forks.

CUDA HGEMM Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction. The calculation expression is as follows, where the precision of matrix A (M * K), B (K * N) and C (M * N) is FP16. Through exploring various matrix tiling and optimization methods, the current performance between 256 to 16384 dimensions is not less than 95% of the performance of cublas, and in many scenarios, it exceeds the performance of cublas. C...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
hai046/JNIFrameDC++Java0010
ardanlabs/gotourGoCSSHTML620360
Questra-Digital/ts-micro-appHCLTypeScriptCSS00300
AndrewGuenther/fck-natHCLShellMakefile535+822+1
lakesoul-io/LakeSoulJavaScalaRust1.7k03640
BrowserBox/BrowserBoxJavaScriptCSSHTML2.8k+10269+3
Significant-Gravitas/AutoGPTJavaScriptPythonJupyter Notebook150.4k033.9k0
baggiponte/pymi-polarsJustCSS1000
pcafrica/advanced_programming_2023-2024C++ShellCSS16000
Tosainu/tosainu.github.comMDXAstroTypeScript0000