moj-analytical-services/splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

JavaScriptPythonRoffJinjaShellCSSOtherdata-sciencesparkrecord-linkageentity-resolutionfuzzy-matchingdeduplicationem-algorithmdata-matchingdeduplicate-dataduckdbuk-gov-data-science
This is stars and forks stats for /moj-analytical-services/splink repository. As of 03 May, 2024 this repository has 791 stars and 102 forks.

Fast, accurate and scalable probabilistic data linkage Splink is a Python package for probabilistic record linkage (entity resolution) that allows you to deduplicate and link records from datasets that lack unique identifiers. Key Features ⚡ Speed: Capable of linking a million records on a laptop in around a minute. 🎯 Accuracy: Support for term frequency adjustments and user-defined fuzzy matching logic. 🌐 Scalability: Execute linkage in Python (using DuckDB) or big-data backends like AWS Athena...
Read on GithubGithub Stats Page
repotechsstarsweeklyforksweekly
tokio-rs/prostRustOther3k+12413+2
DataDog/libdatadogRustShellRuby27040
Deltares/GEOLibSchemePythonJinja170130
JaccoVeldscholten/SlimmeMeterDashboardSCSSJavaScriptCSS11000
discord-extensions/essenceSCSSCSS630390
DataDog/rum-events-formatJavaScriptTypeScript10050
taizilongxu/interview_pythonShell15.9k05.5k0
nezavisimost/FuckRKN1ShellBatchfileClojure6290680
team-s2/ACTF-2022SolidityC++Python720110
onbjerg/forge-coverage-testSolidityOther36030