This is stars and forks stats for /huggingface/tokenizers repository. As of 29 Apr, 2024 this repository has 7655 stars and 649 forks.
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tokenize, using today's most used tokenizers. Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. Easy to use, but also extremely versatile. Designed for research and production. Normalization comes with alignments tracking. It's always possible to get...
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tokenize, using today's most used tokenizers. Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. Easy to use, but also extremely versatile. Designed for research and production. Normalization comes with alignments tracking. It's always possible to get...
repo | techs | stars | weekly | forks | weekly |
---|---|---|---|---|---|
launchbadge/sqlx | RustOther | 10.1k | 0 | 983 | 0 |
copy/v86 | RustJavaScriptC | 18.3k | 0 | 1.3k | 0 |
Wilfred/difftastic | RustOther | 15.5k | 0 | 252 | 0 |
killercup/cargo-edit | Rust | 2.9k | 0 | 148 | 0 |
scalacenter/scalafix | ScalaJavaOther | 765 | 0 | 180 | 0 |
andrewbanchich/forty-jekyll-theme | SCSSJavaScriptHTML | 977 | +5 | 2k | -2 |
uswds/uswds | SCSSJavaScriptTwig | 6.5k | +3 | 956 | +1 |
Esri/calcite-web | SCSSJavaScriptRuby | 105 | 0 | 58 | 0 |
cloudevents/spec | PythonANTLRMakefile | 4.4k | +11 | 560 | 0 |
longhorn/longhorn | ShellPythonMustache | 5k | +13 | 636 | 0 |