Liu-xiandong/How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

CudaShellMakefilehpcreducehigh-performance-computinggpu-accelerationsgemmelementwisesgemv
This is stars and forks stats for /Liu-xiandong/How_to_optimize_in_GPU repository. As of 05 May, 2024 this repository has 426 stars and 78 forks.

How to optimize in GPU This is a series of GPU optimization topics. Here we will introduce how to optimize the program on the GPU in detail. Here, I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. All the following performance data are run on V100 and tested by nsight. If you have any questions, you can directly contact: [email protected] 1. elementwise For elementwise kernel, the optimization techniques that can be used are mainly vectorized...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
Unleash/unleash-dockerShellDockerfile9701030
AlexisAhmed/BugBountyToolkitDockerfileShell99602750
vspinu/sesmanEmacs LispMakefile63060
WhatsApp/erlfmtErlangMakefile383+2470
archway-network/testnet-signerGoMakefile6102530
projectdiscovery/dnsxGoShellOther1.7k02290
tharsis/evmosGoPythonSolidity1.5k07350
aws/karpenterGoShellHTML4.9k06430
digital-asset/lib-financeHaskellShell13060
observatorium/observatoriumGoJsonnetMakefile1860740