This is stars and forks stats for /togethercomputer/RedPajama-Data repository. As of 20 Apr, 2024 this repository has 3520 stars and 272 forks.
RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset This repo contains a reproducible data receipe for the RedPajama data, with the following token counts: Dataset Token Count Commoncrawl 878 Billion C4 175 Billion GitHub 59 Billion Books 26 Billion ArXiv 28 Billion Wikipedia 24 Billion StackExchange 20 Billion Total 1.2 Trillion Data Preparation In data_prep, we provide all pre-processing scripts and guidelines. Tokenization In tokenization, we provide an example of how to...
RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset This repo contains a reproducible data receipe for the RedPajama data, with the following token counts: Dataset Token Count Commoncrawl 878 Billion C4 175 Billion GitHub 59 Billion Books 26 Billion ArXiv 28 Billion Wikipedia 24 Billion StackExchange 20 Billion Total 1.2 Trillion Data Preparation In data_prep, we provide all pre-processing scripts and guidelines. Tokenization In tokenization, we provide an example of how to...
repo | techs | stars | weekly | forks | weekly |
---|---|---|---|---|---|
zilliztech/GPTCache | PythonOther | 5.3k | 0 | 362 | 0 |
wjz304/arpl-i18n | ShellCOther | 2.3k | 0 | 369 | 0 |
aleskxyz/reality-ezpz | ShellPython | 778 | 0 | 109 | 0 |
PatrickAlphaC/foundry-smart-contract-lottery-f23 | SolidityMakefile | 26 | 0 | 5 | 0 |
pashpashpash/vault-ai | JavaScriptGoLess | 3.1k | 0 | 298 | 0 |
nvim-neotest/neotest-plenary | LuaShell | 21 | 0 | 5 | 0 |
kronosnet/knet-ci-test | M4ShellMakefile | 0 | 0 | 4 | 0 |
ricardoerikson/makefile-latex | MakefileShell | 0 | 0 | 0 | 0 |
kaqijiang/Auto-GPT-ZH | PythonOther | 2.3k | 0 | 401 | 0 |
haotian-liu/LLaVA | PythonShellJavaScript | 7.6k | 0 | 651 | 0 |