Bruce-Lee-LY/cuda_hgemm - stats on ReviewGithub

Cuda C C++Shell Python CMake gpu cuda cublas nvidia gemm matrix-multiply tensor-core hgemm

This is stars and forks stats for /Bruce-Lee-LY/cuda_hgemm repository. As of 29 Apr, 2024 this repository has 55 stars and 15 forks.

CUDA HGEMM Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction. The calculation expression is as follows, where the precision of matrix A (M * K), B (K * N) and C (M * N) is FP16. Through exploring various matrix tiling and optimization methods, the current performance between 256 to 16384 dimensions is not less than 95% of the performance of cublas, and in many scenarios, it exceeds the performance of cublas. C...

Read on Github Github Stats Page

repo	techs	stars	weekly	forks	weekly
hai046/JNIFrame	DC++Java	0	0	1	0
ardanlabs/gotour	GoCSSHTML	62	0	36	0
Questra-Digital/ts-micro-app	HCLTypeScriptCSS	0	0	30	0
AndrewGuenther/fck-nat	HCLShellMakefile	535	+8	22	+1
lakesoul-io/LakeSoul	JavaScalaRust	1.7k	0	364	0
BrowserBox/BrowserBox	JavaScriptCSSHTML	2.8k	+10	269	+3
Significant-Gravitas/AutoGPT	JavaScriptPythonJupyter Notebook	150.4k	0	33.9k	0
baggiponte/pymi-polars	JustCSS	1	0	0	0
pcafrica/advanced_programming_2023-2024	C++ShellCSS	16	0	0	0
Tosainu/tosainu.github.com	MDXAstroTypeScript	0	0	0	0