FMInference/FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

PythonShellmachine-learningdeep-learningoffloadinghigh-throughputoptgpt-3large-language-models
This is stars and forks stats for /FMInference/FlexGen repository. As of 08 May, 2024 this repository has 8582 stars and 490 forks.

FlexGen: High-throughput Generative Inference of Large Language Models with a Single GPU [paper] FlexGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes. Motivation In recent years, large language models (LLMs) have shown great performance across a wide range of tasks. Increasingly, LLMs have been applied not only to interactive applications...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
miguelgrinberg/microdotPythonOther8030870
mobarski/ask-my-pdfPythonOther48701830
Sjb4243/SRPJupyter NotebookRShell0090
Bywalks/DarkAngelRubyPythonShell5090660
chrisbra/Recover.vimVim ScriptPythonMakefile2400250
surajaifly/winter-23ApexJavaScriptHTML0050
mortbopet/RipesC++AssemblyCMake2.1k+122490
furkanagess/flutter_base_projectCMakeDartC++42+130
iNeuronai/flask_app_aCSSPythonHTML120320
daluobai-devops/jenkins-shared-libraryGroovyShell35030