Unstructured-IO/unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

HTMLPythonShellMakefileDockerfileXSLTnlppdfmachine-learningnatural-language-processinginformation-retrievalocrdeep-learningmldocxpreprocessingpdf-to-textdata-pipelinesdonutdocument-image-processingdocument-parserpdf-to-jsondocument-image-analysisllmdocument-parsinglangchain
This is stars and forks stats for /Unstructured-IO/unstructured repository. As of 25 Apr, 2024 this repository has 2920 stars and 210 forks.

Open-Source Pre-Processing Tools for Unstructured Data The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular bricks and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and efficient...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
omenking/aws-bootcamp-cruddur-2023JavaScriptCSSPython28004310
hongfz16/EVA3DPythonCudaC++4920330
Viditagarwal7479/Recognizance-23Jupyter NotebookPython280360
ajstarks/dubois-data-portraitsMakefileDTraceOther1130610
Akegarasu/lora-scriptsPythonPowerShellShell2.8k03820
mikeizbicki/pagila-hwPLpgSQLShellDockerfile00730
Mikubill/sd-webui-controlnetPythonCudaC++13.3k01.7k0
lucidrains/toolformer-pytorchPython1.7k01010
acheong08/OpenAIAuthGoPythonMakefile42401220
scikit-optimize/scikit-optimizePythonShellMakefile2.7k+35180