siboehm/SGEMM_CUDA - stats on ReviewGithub

Cuda Shell Python CMake Makefile

This is stars and forks stats for /siboehm/SGEMM_CUDA repository. As of 29 Apr, 2024 this repository has 126 stars and 11 forks.

Fast CUDA SGEMM from Scratch Step-by-step optimization of matrix multiplication, implemented in CUDA. For an explanation of each kernel, see siboehm.com/CUDA-MMM. Overview Running the kernels on a NVIDIA A6000 (Ampere): GFLOPs at matrix size 4096x4096: Kernel GFLOPs/s Performance relative to cuBLAS 1: Naive 309.0 1.3% 2: GMEM Coalescing 1986.5 8.5% 3: SMEM Caching 2980.3 12.8% 4: 1D Blocktiling 8474.7 36.5% 5: 2D Blocktiling 15971.7 68.7% 7: Avoid Bank Conflicts (Linearize) 16213.4 69.7% 8: Avoid...

Read on Github Github Stats Page

repo	techs	stars	weekly	forks	weekly
h3mmy/bloopySphere	HCLShellFreeMarker	27	+1	7	0
threefoldtech/info_grid	ShellMakefileJavaScript	1	0	0	0
akhtyamovpavel/BuildExamples-TP	MakefileC++C	9	0	8	0
bakueikozo/buildroot_am3352_aki	MakefilePythonC	18	0	3	0
MatthewCroughan/nixcfg	NixVim ScriptCSS	167	0	7	0
wizwizdev/wizwizxui-timebot	PHPCSSShell	656	0	130	0
GammaTauAI/reflexion-human-eval	PythonJupyter NotebookShell	1.5k	+14	134	+1
stochasticai/xturing	Python	2.3k	0	184	0
guardian/typerighter	ScalaTypeScriptLess	270	+1	11	0
calyptia/charts	SmartyShell	5	0	3	0