huggingface/tokenizers - stats on ReviewGithub

Rust Python Jupyter Notebook TypeScript JavaScript CSS Other nlp natural-language-processing transformers gpt language-model bert natural-language-understanding

This is stars and forks stats for /huggingface/tokenizers repository. As of 29 Apr, 2024 this repository has 7655 stars and 649 forks.

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tokenize, using today's most used tokenizers. Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. Easy to use, but also extremely versatile. Designed for research and production. Normalization comes with alignments tracking. It's always possible to get...

Read on Github Github Stats Page

repo	techs	stars	weekly	forks	weekly
launchbadge/sqlx	RustOther	10.1k	0	983	0
copy/v86	RustJavaScriptC	18.3k	0	1.3k	0
Wilfred/difftastic	RustOther	15.5k	0	252	0
killercup/cargo-edit	Rust	2.9k	0	148	0
scalacenter/scalafix	ScalaJavaOther	765	0	180	0
andrewbanchich/forty-jekyll-theme	SCSSJavaScriptHTML	977	+5	2k	-2
uswds/uswds	SCSSJavaScriptTwig	6.5k	+3	956	+1
Esri/calcite-web	SCSSJavaScriptRuby	105	0	58	0
cloudevents/spec	PythonANTLRMakefile	4.4k	+11	560	0
longhorn/longhorn	ShellPythonMustache	5k	+13	636	0