ahennequ/cuda-tensorcores-register-mapping

Cuda
This is stars and forks stats for /ahennequ/cuda-tensorcores-register-mapping repository. As of 07 May, 2024 this repository has 15 stars and 1 forks.

Cuda tensorcores register mapping Since the Volta architecture, NVIDIA's GPUs include tensorcores that can be used to accelerate matrix multiplication. Each warp is able to produce a 16x16 fragment of the output, stored in a distributed register cache. However, the layout of registers is unspecified. Because of this restriction, user code is limited to storing the fragment back to shared or global memory, using the provided API function, or applying pointwise operations that do not need to know the...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
google/skiaC++CAssembly7.9k01.4k0
greyss-mai/Department806CudaC++Jupyter Notebook00340
kakaobrain/NeRF-FactoryPythonCudaC++1.2k0930
sczhou/CodeFormerPythonCudaC++10.3k02.3k0
IDEA-Research/detrexPythonCudaOther1.5k+7170+2
nv-tlabs/GET3DPythonCudaC++3.9k03360
Newbeeer/Poisson_flowPythonCudaOther7860610
amov-lab/PrometheusC++PythonCMake2k03730
Netflix/vmafPythonCMATLAB3.9k06980
facebookresearch/xformersPythonC++Cuda5.7k04110