SqueezeAILab/SqueezeLLM

SqueezeLLM: Dense-and-Sparse Quantization

PythonCudaC++natural-language-processingtext-generationtransformerllamaquantizationmodel-compressionefficient-inferencepost-training-quantizationlarge-language-modelsllmsmall-modelslocalllm
This is stars and forks stats for /SqueezeAILab/SqueezeLLM repository. As of 03 May, 2024 this repository has 397 stars and 25 forks.

SqueezeLLM: Dense-and-Sparse Quantization [Paper] SqueezeLLM is a post-training quantization framework that incorporates a new method called Dense-and-Sparse Quantization to enable efficient LLM serving. TLDR: Deploying LLMs is difficult due to their large memory size. This can be addressed with reduced precision quantization. But a naive method hurts performance. We address this with a new Dense-and-Sparse Quantization method. Dense-and-Sparse splits weight matrices into two components: A dense...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
chavinlo/musicgen_trainerPython2610250
hitachisolutionsamerica/dataestate-benchmarksScalaPythonPLpgSQL80100
estufa-cin-ufpe/RISC-V-PipelineSystemVerilogVerilogPython1000
geodynamics/RayleighFortranPythonJupyter Notebook520440
AI4Finance-Foundation/FinRL-TutorialsJupyter NotebookPython56502550
dbekaert/StaMPSMATLABPerlC2000116+1
Anil-matcha/ChatPDFPython90301370
wmariuss/awesome-devopsPython1.4k02210
noahshinn024/reflexionPythonJupyter NotebookShell1.5k01340
cran/CptNonParRC++0000