AlibabaResearch/flash-llm

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

CudaPythonC++CShellMakefile
This is stars and forks stats for /AlibabaResearch/flash-llm repository. As of 06 May, 2024 this repository has 73 stars and 9 forks.

Flash-LLM Flash-LLM is a large language model (LLM) inference acceleration library for unstructured model pruning. Flash-LLM mainly contains efficient GPU code based on Tensor-Core-accelerated unstructured sparse matrix multiplication calculations, which can effectively accelerate the performance of common matrix calculations in LLM. With Flash-LLM, the pruned LLM models can be deployed onto GPUs with less memory consumption and can be executed more efficiently. Currently, the code has been evaluated...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
ingonyama-zk/cuda-field-exptsCuda10000
Hamad-Anwar/Task-Sync-Pro-V2DartC++CMake730220
RoboMaster/IntelligentUAVChampionshipSimulatorDockerfileShell54+1140
gsnewmark/dotfilesEmacs LispShellCSS7000
NOAA-EMC/NCEPLIBS-g2FortranCCMake40130
SarangKumar/IO-LearnHubJavaScriptHTMLCSS26+790
OutRed/outredgamesHTMLJavaScriptCSS8301560
beacon-biosignals/Ray.jlJuliaC++Dockerfile8000
jontelang/BigShotJbSnapper3PluginLogosMakefileObjective-C21+210
oauth-wg/oauth-sd-jwt-vcMakefile9090