allenai/mmc4

MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.

PythonShell
This is stars and forks stats for /allenai/mmc4 repository. As of 03 May, 2024 this repository has 777 stars and 27 forks.

📷 📝 Multimodal C4 (mmc4) 📝 📷 An open, billion-scale corpus of images interleaved with text. arXiv paper with curation details out now! Updates released mmc4 version 1.1 🔥 which fixes #11 and #10 Corpus stats (v1.1) # images # docs # tokens Multimodal-C4 (mmc4) 571M 101.2M 43B Multimodal-C4 fewer-faces (mmc4-ff) 375M 77.7M 33B Multimodal-C4 core (mmc4-core) 29.9M 7.3M 2.4B Multimodal-C4 core fewer-faces (mmc4-core-ff) 22.4M 5.5M 1.8B More details about these datasets and our processing steps can...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
DigitalSQR/resource-visualizationBatchfileShellCSS1000
jstedfast/MimeKitC#HTMLPowerShell1.7k03470
nomic-ai/gpt4all-chatC++QMLCMake1.3k01460
cmsc389T-spring23/Lecture9DockerfileShell0010
SN-Abdullah-Al-Noman/Al-NomanDockerfilePython1060
nomasystems/nuidErlangPython6000
kubernetes/kubectlGoPython2.5k08450
z-x-yang/Segment-and-Track-AnythingJupyter NotebookPython1.9k02290
TheCompAce/Auto-GPT-PowershellPowerShellPythonOther880240
Vision-CAIR/MiniGPT-4PythonShell22.7k02.7k0