qwopqwop200/GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Python
This is stars and forks stats for /qwopqwop200/GPTQ-for-LLaMa repository. As of 28 Apr, 2024 this repository has 2651 stars and 429 forks.

GPTQ-for-LLaMA I am currently focusing on AutoGPTQ and recommend using AutoGPTQ instead of GPTQ for Llama. 4 bits quantization of LLaMA using GPTQ GPTQ is SOTA one-shot weight quantization method It can be used universally, but it is not the fastest and only supports linux. Triton only supports Linux, so if you are a Windows user, please use WSL2. News or Update AutoGPTQ-triton, a packaged version of GPTQ with triton, has been integrated into AutoGPTQ. Result LLaMA-7B(click me) LLaMA-7B Bits group-size memory(MiB) Wikitext2 checkpoint...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
guillaumekln/faster-whisperPython5k03570
aws/aws-eks-best-practicesPythonGoDockerfile1.6k03910
Berkanktk/CyberSecurityPythonShellC++5250330
KohakuBlueleaf/LyCORISPython1.5k+2193+1
kevinjycui/DesmosBezierRendererPythonHTML466+594+1
IHP-GmbH/IHP-Open-PDKPythonHTMLMATLAB2010150
innnky/emotional-vitsJupyter NotebookPython1k01480
microsoft/visual-chatgptPythonHTMLDockerfile34.2k03.3k0
Azure-Samples/azure-search-openai-demoPythonTypeScriptBicep3.8k02.2k0
butaixianran/Stable-Diffusion-Webui-Civitai-HelperPythonJavaScriptCSS2k+13232+1