wangsiping97/FastGEMV

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

CudaC++MakefilePythonmachine-learningoptimizationcudacuda-kernels
This is stars and forks stats for /wangsiping97/FastGEMV repository. As of 11 May, 2024 this repository has 26 stars and 0 forks.

FastGEMV This repository provides a collection of kernel functions that enable high-speed computation of GEMV (matrix-vector dot product). We have implemented and benchmarked the following scenarios: matrix: fp16, vector: fp16; matrix: int8 (quantized with fp16 scale/zero point), vector: fp16; matrix: int4 (quantized with fp16 scale/zero point), vector: fp16. The matrix and vector sizes range from 512 to 16384. On P100 GPUs, we achieved a maximum speedup of 2.7x compared to the PyTorch baseline....
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
openwebf/webfDartC++JavaScript1k0800
SWMFsoftware/GITM2FortranPythonIDL0000
letianzj/quanttraderHTMLPythonJavaScript3580900
Mehdi-H/WeeklyCurationMakefile20000
sylefeb/a5kShellPythonMakefile252070
weaviate/weaviate-ioMDXPythonJavaScript360940
facebookresearch/playtorchMDXTypeScriptC++80601050
nf-core/marsseqNextflowPerlPython4010
baichuan-inc/Baichuan-13BPython2.7k01950
guoyww/AnimateDiffPythonShell4.5k+365353+23