mit-han-lab/streaming-llm

Efficient Streaming Language Models with Attention Sinks

Python
This is stars and forks stats for /mit-han-lab/streaming-llm repository. As of 27 Apr, 2024 this repository has 4561 stars and 234 forks.

Efficient Streaming Language Models with Attention Sinks [paper] streamingllm_demo.mp4 TL;DR We deploy LLMs for infinite-length inputs without sacrificing efficiency and performance. Abstract Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
danielgross/localpilotPython1.2k0600
Mayandev/where-is-douban250PythonJavaScript5090400
ray-project/ray-llmPythonOther736+3847+2
win3zz/CVE-2023-43261Python44070
ONLYOFFICE/DocSpace-buildtoolsRich Text FormatShellPython0010
DHEERAJHARODE/Hacktoberfest2023-Open-source-Jupyter NotebookC++C34002.3k0
Source2ZE/CS2FixesC++PythonC530330
NVIDIA/MatXC++CudaCMake9890570
pct/TiAppCoffeeScriptRubyPython9000
naveen3011/WebD_project_hacktober2023CSSJavaScriptHTML2501050