This is stars and forks stats for /ahennequ/cuda-tensorcores-register-mapping repository. As of 07 May, 2024 this repository has 15 stars and 1 forks.
Cuda tensorcores register mapping Since the Volta architecture, NVIDIA's GPUs include tensorcores that can be used to accelerate matrix multiplication. Each warp is able to produce a 16x16 fragment of the output, stored in a distributed register cache. However, the layout of registers is unspecified. Because of this restriction, user code is limited to storing the fragment back to shared or global memory, using the provided API function, or applying pointwise operations that do not need to know the...
Cuda tensorcores register mapping Since the Volta architecture, NVIDIA's GPUs include tensorcores that can be used to accelerate matrix multiplication. Each warp is able to produce a 16x16 fragment of the output, stored in a distributed register cache. However, the layout of registers is unspecified. Because of this restriction, user code is limited to storing the fragment back to shared or global memory, using the provided API function, or applying pointwise operations that do not need to know the...
repo | techs | stars | weekly | forks | weekly |
---|---|---|---|---|---|
google/skia | C++CAssembly | 7.9k | 0 | 1.4k | 0 |
greyss-mai/Department806 | CudaC++Jupyter Notebook | 0 | 0 | 34 | 0 |
kakaobrain/NeRF-Factory | PythonCudaC++ | 1.2k | 0 | 93 | 0 |
sczhou/CodeFormer | PythonCudaC++ | 10.3k | 0 | 2.3k | 0 |
IDEA-Research/detrex | PythonCudaOther | 1.5k | +7 | 170 | +2 |
nv-tlabs/GET3D | PythonCudaC++ | 3.9k | 0 | 336 | 0 |
Newbeeer/Poisson_flow | PythonCudaOther | 786 | 0 | 61 | 0 |
amov-lab/Prometheus | C++PythonCMake | 2k | 0 | 373 | 0 |
Netflix/vmaf | PythonCMATLAB | 3.9k | 0 | 698 | 0 |
facebookresearch/xformers | PythonC++Cuda | 5.7k | 0 | 411 | 0 |