This is stars and forks stats for /siboehm/SGEMM_CUDA repository. As of 29 Apr, 2024 this repository has 126 stars and 11 forks.
Fast CUDA SGEMM from Scratch Step-by-step optimization of matrix multiplication, implemented in CUDA. For an explanation of each kernel, see siboehm.com/CUDA-MMM. Overview Running the kernels on a NVIDIA A6000 (Ampere): GFLOPs at matrix size 4096x4096: Kernel GFLOPs/s Performance relative to cuBLAS 1: Naive 309.0 1.3% 2: GMEM Coalescing 1986.5 8.5% 3: SMEM Caching 2980.3 12.8% 4: 1D Blocktiling 8474.7 36.5% 5: 2D Blocktiling 15971.7 68.7% 7: Avoid Bank Conflicts (Linearize) 16213.4 69.7% 8: Avoid...
Fast CUDA SGEMM from Scratch Step-by-step optimization of matrix multiplication, implemented in CUDA. For an explanation of each kernel, see siboehm.com/CUDA-MMM. Overview Running the kernels on a NVIDIA A6000 (Ampere): GFLOPs at matrix size 4096x4096: Kernel GFLOPs/s Performance relative to cuBLAS 1: Naive 309.0 1.3% 2: GMEM Coalescing 1986.5 8.5% 3: SMEM Caching 2980.3 12.8% 4: 1D Blocktiling 8474.7 36.5% 5: 2D Blocktiling 15971.7 68.7% 7: Avoid Bank Conflicts (Linearize) 16213.4 69.7% 8: Avoid...
repo | techs | stars | weekly | forks | weekly |
---|---|---|---|---|---|
h3mmy/bloopySphere | HCLShellFreeMarker | 27 | +1 | 7 | 0 |
threefoldtech/info_grid | ShellMakefileJavaScript | 1 | 0 | 0 | 0 |
akhtyamovpavel/BuildExamples-TP | MakefileC++C | 9 | 0 | 8 | 0 |
bakueikozo/buildroot_am3352_aki | MakefilePythonC | 18 | 0 | 3 | 0 |
MatthewCroughan/nixcfg | NixVim ScriptCSS | 167 | 0 | 7 | 0 |
wizwizdev/wizwizxui-timebot | PHPCSSShell | 656 | 0 | 130 | 0 |
GammaTauAI/reflexion-human-eval | PythonJupyter NotebookShell | 1.5k | +14 | 134 | +1 |
stochasticai/xturing | Python | 2.3k | 0 | 184 | 0 |
guardian/typerighter | ScalaTypeScriptLess | 270 | +1 | 11 | 0 |
calyptia/charts | SmartyShell | 5 | 0 | 3 | 0 |