mit-han-lab/llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

PythonC++CudaOther
This is stars and forks stats for /mit-han-lab/llm-awq repository. As of 10 May, 2024 this repository has 893 stars and 66 forks.

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration [Paper] Efficient and accurate low-bit weight quantization (INT3/4) for LLMs, supporting instruction-tuned models and multi-modal LMs. The current release supports: AWQ search for accurate quantization. Pre-computed AWQ model zoo for LLMs (LLaMA-1&2, OPT, Vicuna, LLaVA; load to generate quantized weights). Memory-efficient 4-bit Linear in PyTorch. Efficient CUDA kernel implementation for fast inference (support context...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
facebookresearch/hieraPython6000230
KasperskyLab/triangle_checkPython3410250
emarco177/ice_breakerPythonHTMLCSS16402730
vincentarelbundock/modelsummaryRHTMLTeX7940690
bytebeamio/rumqttRustOther1.2k01910
SFDigitalServices/sf-dahlia-webSCSSCoffeeScriptTypeScript290170
movie-web/movie-webTypeScriptOther1.3k02350
noahspurrier/dotfilesVim ScriptShellPython1000
htphongx4/vue3-ts-vite-boilerplateVueTypeScriptMDX660340
parity-asia/hackathon-2023-summerRustTypeScriptCSS140750