This is stars and forks stats for /openmlsys/openmlsys-cuda repository. As of 10 May, 2024 this repository has 77 stars and 8 forks.
openmlsys-cuda Examples for beginners to write your own high-performance AI operators. We introduced optimizations tricks like using shared memory and pipeline rearrangement to maximize the throughput. We also provided an example for using CUTLASS to implement an FC + ReLU fused operator. Dependencies Eigen: CPU linear algebra template library OpenMP: Enable multi-threads acceleration on CPU CUDA toolkit: Compile GPU kernels and analyse GPU executions Gflags: Commandline flags library released by...
openmlsys-cuda Examples for beginners to write your own high-performance AI operators. We introduced optimizations tricks like using shared memory and pipeline rearrangement to maximize the throughput. We also provided an example for using CUTLASS to implement an FC + ReLU fused operator. Dependencies Eigen: CPU linear algebra template library OpenMP: Enable multi-threads acceleration on CPU CUDA toolkit: Compile GPU kernels and analyse GPU executions Gflags: Commandline flags library released by...
repo | techs | stars | weekly | forks | weekly |
---|---|---|---|---|---|
webhdx/PicoBoot | PythonCCMake | 1.3k | 0 | 88 | 0 |
zhiqwang/yolov5-rt-stack | PythonC++CMake | 674 | 0 | 151 | 0 |
weronikakaskosz/co_jemy | DartC++CMake | 2 | 0 | 1 | 0 |
featureform/featureform | Jupyter NotebookGoPython | 1.5k | 0 | 75 | 0 |
MetalPetal/MetalPetal | Objective-CSwiftObjective-C++ | 1.7k | 0 | 218 | 0 |
Zilliqa/scilla | OCamlRakuPerl | 240 | 0 | 82 | 0 |
OpenImageIO/oiio | C++POV-Ray SDLPython | 1.8k | 0 | 538 | 0 |
uncomplicate/deep-diamond | ClojureCuda | 404 | 0 | 17 | 0 |
flet-dev/flet | PythonDartGo | 6.8k | 0 | 263 | 0 |
ntoskrnl7/crtsys | C++CCMake | 162 | +1 | 32 | +1 |