Fuzzy matching at scale

From 3.7 hours to 0.2 seconds. How to perform intelligent string matching in a way that can scale to even the biggest data sets.

Josh Taylor
Towards Data Science
8 min readJul 1, 2019

--

Same but different. Fuzzy matching of data is an essential first-step for a huge range of data science workflows.

### Update December 2020: A faster, simpler way of fuzzy matching is now included at the end of this post with the full code to implement it on any dataset###

--

--