togethercomputer/RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

PythonShellMakefile

Stars and forks stats for /togethercomputer/RedPajama-Data

0 forks on 2023-03-080 forks on 2023-03-090 forks on 2023-03-100 forks on 2023-03-110 forks on 2023-03-120 forks on 2023-03-130 forks on 2023-03-140 forks on 2023-03-150 forks on 2023-03-160 forks on 2023-03-170 forks on 2023-03-180 forks on 2023-03-190 forks on 2023-03-200 forks on 2023-03-210 forks on 2023-03-220 forks on 2023-03-230 forks on 2023-03-240 forks on 2023-03-250 forks on 2023-03-260 forks on 2023-03-270 forks on 2023-03-280 forks on 2023-03-290 forks on 2023-03-300 forks on 2023-03-310 forks on 2023-04-010 forks on 2023-04-020 forks on 2023-04-030 forks on 2023-04-040 forks on 2023-04-050 forks on 2023-04-060 forks on 2023-04-070 forks on 2023-04-080 forks on 2023-04-090 forks on 2023-04-100 forks on 2023-04-110 forks on 2023-04-120 forks on 2023-04-130 forks on 2023-04-140 forks on 2023-04-150 forks on 2023-04-160 forks on 2023-04-170 forks on 2023-04-1862 forks on 2023-04-1988 forks on 2023-04-2091 forks on 2023-04-21103 forks on 2023-04-22120 forks on 2023-04-23120 forks on 2023-04-24134 forks on 2023-04-25138 forks on 2023-04-26146 forks on 2023-04-27151 forks on 2023-04-28153 forks on 2023-04-29155 forks on 2023-04-30158 forks on 2023-05-01164 forks on 2023-05-02164 forks on 2023-05-03172 forks on 2023-05-04172 forks on 2023-05-05178 forks on 2023-05-06180 forks on 2023-05-07184 forks on 2023-05-08185 forks on 2023-05-09191 forks on 2023-05-10195 forks on 2023-05-11195 forks on 2023-05-12196 forks on 2023-05-13196 forks on 2023-05-14196 forks on 2023-05-15196 forks on 2023-05-16197 forks on 2023-05-17201 forks on 2023-05-18202 forks on 2023-05-19203 forks on 2023-05-20203 forks on 2023-05-21205 forks on 2023-05-22208 forks on 2023-05-23210 forks on 2023-05-24210 forks on 2023-05-25214 forks on 2023-05-26215 forks on 2023-05-27220 forks on 2023-05-28221 forks on 2023-05-29221 forks on 2023-05-30223 forks on 2023-05-31224 forks on 2023-06-01225 forks on 2023-06-02226 forks on 2023-06-03226 forks on 2023-06-04227 forks on 2023-06-05

227forks in total +165last 90 days

0 stars on 2023-03-080 stars on 2023-03-090 stars on 2023-03-100 stars on 2023-03-110 stars on 2023-03-120 stars on 2023-03-130 stars on 2023-03-140 stars on 2023-03-150 stars on 2023-03-160 stars on 2023-03-170 stars on 2023-03-180 stars on 2023-03-190 stars on 2023-03-200 stars on 2023-03-210 stars on 2023-03-220 stars on 2023-03-230 stars on 2023-03-240 stars on 2023-03-250 stars on 2023-03-260 stars on 2023-03-270 stars on 2023-03-280 stars on 2023-03-290 stars on 2023-03-300 stars on 2023-03-310 stars on 2023-04-010 stars on 2023-04-020 stars on 2023-04-030 stars on 2023-04-040 stars on 2023-04-050 stars on 2023-04-060 stars on 2023-04-070 stars on 2023-04-080 stars on 2023-04-090 stars on 2023-04-100 stars on 2023-04-110 stars on 2023-04-120 stars on 2023-04-130 stars on 2023-04-140 stars on 2023-04-150 stars on 2023-04-160 stars on 2023-04-170 stars on 2023-04-181 052 stars on 2023-04-191 249 stars on 2023-04-201 320 stars on 2023-04-211 466 stars on 2023-04-221 670 stars on 2023-04-231 670 stars on 2023-04-241 789 stars on 2023-04-251 906 stars on 2023-04-261 963 stars on 2023-04-272 016 stars on 2023-04-282 058 stars on 2023-04-292 080 stars on 2023-04-302 105 stars on 2023-05-012 151 stars on 2023-05-022 151 stars on 2023-05-032 266 stars on 2023-05-042 266 stars on 2023-05-052 338 stars on 2023-05-062 386 stars on 2023-05-072 445 stars on 2023-05-082 480 stars on 2023-05-092 518 stars on 2023-05-102 539 stars on 2023-05-112 563 stars on 2023-05-122 585 stars on 2023-05-132 585 stars on 2023-05-142 596 stars on 2023-05-152 624 stars on 2023-05-162 650 stars on 2023-05-172 665 stars on 2023-05-182 684 stars on 2023-05-192 701 stars on 2023-05-202 712 stars on 2023-05-212 726 stars on 2023-05-222 765 stars on 2023-05-232 781 stars on 2023-05-242 797 stars on 2023-05-252 808 stars on 2023-05-262 820 stars on 2023-05-272 831 stars on 2023-05-282 844 stars on 2023-05-292 844 stars on 2023-05-302 854 stars on 2023-05-312 862 stars on 2023-06-012 878 stars on 2023-06-022 889 stars on 2023-06-032 892 stars on 2023-06-042 904 stars on 2023-06-05

2.9kstars in total +1.9klast 90 days

This is stars and forks stats for /togethercomputer/RedPajama-Data repository. As of 05 Jun, 2023 this repository has 2904 stars and 227 forks.

RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset This repo contains a reproducible data receipe for the RedPajama data, with the following token counts: Dataset Token Count Commoncrawl 878 Billion C4 175 Billion GitHub 59 Billion Books 26 Billion ArXiv 28 Billion Wikipedia 24 Billion StackExchange 20 Billion Total 1.2 Trillion Data Preparation In data_prep, we provide all pre-processing scripts and guidelines. Tokenization In tokenization, we provide an example of how to...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
zilliztech/GPTCachePythonOther3.6k+117224+7
wjz304/arpl-i18nShellCPython454+7079+8
aleskxyz/reality-ezpzShell244+2632+2
PatrickAlphaC/foundry-smart-contract-lottery-f23SolidityMakefile22040
terraform-ibm-modules/terraform-ibm-landing-zone-vpcHCLShellGo4060
pashpashpash/vault-aiJavaScriptLessGo2.5k02190
nvim-neotest/neotest-plenaryLuaShell18020
kronosnet/knet-ci-testM4ShellMakefile0040
ricardoerikson/makefile-latexMakefileShell0000
kaqijiang/Auto-GPT-ZHPythonOther2.1k+50359+10