ahennequ/cuda-tensorcores-register-mapping

Cuda

This is stars and forks stats for /ahennequ/cuda-tensorcores-register-mapping repository. As of 07 May, 2024 this repository has 15 stars and 1 forks.

Cuda tensorcores register mapping Since the Volta architecture, NVIDIA's GPUs include tensorcores that can be used to accelerate matrix multiplication. Each warp is able to produce a 16x16 fragment of the output, stored in a distributed register cache. However, the layout of registers is unspecified. Because of this restriction, user code is limited to storing the fragment back to shared or global memory, using the provided API function, or applying pointwise operations that do not need to know the...

Read on Github Github Stats Page

repo	techs	stars	weekly	forks	weekly
google/skia	C++CAssembly	7.9k	0	1.4k	0
greyss-mai/Department806	CudaC++Jupyter Notebook	0	0	34	0
kakaobrain/NeRF-Factory	PythonCudaC++	1.2k	0	93	0
sczhou/CodeFormer	PythonCudaC++	10.3k	0	2.3k	0
IDEA-Research/detrex	PythonCudaOther	1.5k	+7	170	+2
nv-tlabs/GET3D	PythonCudaC++	3.9k	0	336	0
Newbeeer/Poisson_flow	PythonCudaOther	786	0	61	0
amov-lab/Prometheus	C++PythonCMake	2k	0	373	0
Netflix/vmaf	PythonCMATLAB	3.9k	0	698	0
facebookresearch/xformers	PythonC++Cuda	5.7k	0	411	0