The world’s leading publication for data science, AI, and ML professionals.

A Flexible Framework for Entity Resolution

Hoyoung Jang & Cheng Lin | TMLS2019

EVENT TALKS

About the speakers

  • Hoyoung Jang, Lead Data Scientist at ThinkData Works
  • Cheng Lin, Honours Student at McGill University

About the talk

A critical component of data management and enrichment pipelines is connecting large datasets from various sources to form a holistic view; to make connections between entities across data sources. Oftentimes, these entities – such as individuals, organizations, or addresses – may not have a unique identifier that can be used as a key to detect duplicates or to merge datasets on. ThinkData has developed a scalable entity resolution engine to solve these problems. After experimenting with both deep learning and traditional NLP techniques, the team has found the best balance of accuracy and performance. Specifically, we have achieved near-parity in accuracy compared to Magellan (the leading entity resolution project in research), albeit with much better performance metrics and greater scalability. This talk will discuss the importance of entity resolution, our approach to solving real-world challenges, and the potential in using entity resolution and graph relationships in tandem.

A Flexible Framework for Entity Resolution
A Flexible Framework for Entity Resolution

Related Articles