This is stars and forks stats for /wangzyon/NVIDIA_SGEMM_PRACTICE repository. As of 28 Apr, 2024 this repository has 85 stars and 20 forks.
概述 面向NVIDIA GPU,使用CUDA编程逐步优化矩阵乘法运算性能: 核函数 描述 GFLOPS 自定义核函数/CUBLAS(%) CUBLAS 官方库函数 14448.69 基准 kernel_1 朴素实现 2262.168 15.65657 kernel_2 共享内存缓存 4216.536 29.18283 kernel_3 一维Thread Tile并行优化 7809.629 54.05078 kernel_4 二维Thread Tile并行优化 12251.3 84.79179 kernel_5 寄存器缓存 12177.95 84.28412 kernel_6 FLOAT4向量访存 13161.49 91.09125 kernel_7 双缓存预取 13634.98 94.36832 NVIDIA GeForce RTX 3090,矩阵尺寸5120 配置 编译采用 gcc 7.5.0 under Ubuntu 18.04.5 LTS NVIDIA CUDA version: CUDA 10.2; 目录 NVIDIA_SGEMM_PRACTICE ...
概述 面向NVIDIA GPU,使用CUDA编程逐步优化矩阵乘法运算性能: 核函数 描述 GFLOPS 自定义核函数/CUBLAS(%) CUBLAS 官方库函数 14448.69 基准 kernel_1 朴素实现 2262.168 15.65657 kernel_2 共享内存缓存 4216.536 29.18283 kernel_3 一维Thread Tile并行优化 7809.629 54.05078 kernel_4 二维Thread Tile并行优化 12251.3 84.79179 kernel_5 寄存器缓存 12177.95 84.28412 kernel_6 FLOAT4向量访存 13161.49 91.09125 kernel_7 双缓存预取 13634.98 94.36832 NVIDIA GeForce RTX 3090,矩阵尺寸5120 配置 编译采用 gcc 7.5.0 under Ubuntu 18.04.5 LTS NVIDIA CUDA version: CUDA 10.2; 目录 NVIDIA_SGEMM_PRACTICE ...
repo | techs | stars | weekly | forks | weekly |
---|---|---|---|---|---|
PrestaShop/docker | DockerfileShellPython | 241 | 0 | 174 | 0 |
fxxkscript/emacs.d | Emacs LispShell | 0 | 0 | 0 | 0 |
ducminh-phan/reformat-gherkin | GherkinPython | 18 | 0 | 12 | 0 |
HL7/vulcan-eproduct-info | GLSLLiquidBatchfile | 11 | 0 | 7 | 0 |
statsd/statsd | JavaScriptShellOther | 17.2k | 0 | 2k | 0 |
Stability-AI/stability-sdk | Jupyter NotebookPython | 2.4k | 0 | 329 | 0 |
BartoszPiotrowski/lean-premise-selection | LeanShellTypeScript | 12 | 0 | 0 | 0 |
disnake-ru/guide | MarkdownJavaScriptPython | 16 | 0 | 25 | 0 |
ossia/libossia | MaxC++C | 190 | 0 | 27 | 0 |
treeform/fidget | NimTypeScriptHTML | 730 | 0 | 32 | 0 |