This is stars and forks stats for /Cjkkkk/CUDA_gemm repository. As of 26 Apr, 2024 this repository has 187 stars and 23 forks.
introduction A simple high performance CUDA GEMM, Block Sparse GEMM and Non-uniform Quantized GEMM implementation. C = alpha * A * B + beta * C algorithm located in src/cuda/ MatrixMulCUDA one element of C is assigned one thread global memory coalesce of B MatrixMulCUDA1 texture load MatrixMulCUDA2 one 4 * 4 grid of C is assigned one thread MatrixMulCUDA3 vectorized A B load MatrixMulCUDA4 vectorized C store MatrixMulCUDA5 block sparse version MatrixMulCUDA6 vectorized A B load coalesce MatrixMulCUDA7 warp...
introduction A simple high performance CUDA GEMM, Block Sparse GEMM and Non-uniform Quantized GEMM implementation. C = alpha * A * B + beta * C algorithm located in src/cuda/ MatrixMulCUDA one element of C is assigned one thread global memory coalesce of B MatrixMulCUDA1 texture load MatrixMulCUDA2 one 4 * 4 grid of C is assigned one thread MatrixMulCUDA3 vectorized A B load MatrixMulCUDA4 vectorized C store MatrixMulCUDA5 block sparse version MatrixMulCUDA6 vectorized A B load coalesce MatrixMulCUDA7 warp...
repo | techs | stars | weekly | forks | weekly |
---|---|---|---|---|---|
FLAMEGPU/FLAMEGPU2 | CudaC++Python | 72 | 0 | 14 | 0 |
resemble-ai/monotonic_align | CythonPython | 53 | 0 | 5 | 0 |
mikumifa/QChatGPT-Docker-Installer | DockerfileShell | 127 | 0 | 26 | 0 |
emacs-straight/persist | Emacs LispMakefile | 1 | 0 | 0 | 0 |
m3g/packmol | FortranTclMakefile | 168 | 0 | 47 | 0 |
haskell/ghcup-hs | HaskellShellPowerShell | 210 | +2 | 55 | +1 |
cloudnloud/weekly-cloud-engineer-interview-program | HCLShellPython | 30 | 0 | 40 | 0 |
dorneanu/gocial | HTMLGoCSS | 42 | 0 | 3 | 0 |
PacktPublishing/The-Machine-Learning-Solutions-Architect-Handbook | Jupyter NotebookPythonShell | 105 | +1 | 32 | 0 |
tomondre/raspberry-kubernetes-cluster | HCLShellJinja | 77 | 0 | 2 | 0 |