This is stars and forks stats for /google/sentencepiece repository. As of 25 Apr, 2024 this repository has 8307 stars and 1041 forks.
SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units (e.g., byte-pair-encoding (BPE) [Sennrich et al.]) and unigram language model [Kudo.]) with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing. This...
SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units (e.g., byte-pair-encoding (BPE) [Sennrich et al.]) and unigram language model [Kudo.]) with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing. This...
repo | techs | stars | weekly | forks | weekly |
---|---|---|---|---|---|
FiYHer/kernel_window_hide | C++ | 261 | 0 | 108 | 0 |
facebook/folly | C++PythonCMake | 26.1k | +28 | 5.5k | +4 |
Light-City/CPlusPlusThings | C++StarlarkC | 33.5k | 0 | 7.9k | 0 |
facebook/rocksdb | C++JavaC | 26.1k | 0 | 5.9k | 0 |
ossrs/srs | C++JavaScriptHTML | 22.8k | 0 | 5.1k | 0 |
esp8266/Arduino | C++CPython | 15.3k | 0 | 13.4k | 0 |
taichi-dev/taichi | C++PythonC | 23.9k | 0 | 2.3k | 0 |
huihut/interview | C++CCMake | 30.6k | 0 | 7.6k | 0 |
qinguoyi/TinyWebServer | C++CHTML | 13k | +77 | 3.4k | +23 |
envoyproxy/envoy | C++StarlarkJava | 22.8k | 0 | 4.5k | 0 |