openmlsys/openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

CudaCMakemachine-learninggpucuda
This is stars and forks stats for /openmlsys/openmlsys-cuda repository. As of 10 May, 2024 this repository has 77 stars and 8 forks.

openmlsys-cuda Examples for beginners to write your own high-performance AI operators. We introduced optimizations tricks like using shared memory and pipeline rearrangement to maximize the throughput. We also provided an example for using CUTLASS to implement an FC + ReLU fused operator. Dependencies Eigen: CPU linear algebra template library OpenMP: Enable multi-threads acceleration on CPU CUDA toolkit: Compile GPU kernels and analyse GPU executions Gflags: Commandline flags library released by...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
webhdx/PicoBootPythonCCMake1.3k0880
zhiqwang/yolov5-rt-stackPythonC++CMake67401510
weronikakaskosz/co_jemyDartC++CMake2010
featureform/featureformJupyter NotebookGoPython1.5k0750
MetalPetal/MetalPetalObjective-CSwiftObjective-C++1.7k02180
Zilliqa/scillaOCamlRakuPerl2400820
OpenImageIO/oiioC++POV-Ray SDLPython1.8k05380
uncomplicate/deep-diamondClojureCuda4040170
flet-dev/fletPythonDartGo6.8k02630
ntoskrnl7/crtsysC++CCMake162+132+1