How to Use n-gram Models to Detect Format Errors in Datasets

A code-first approach on how language models can be extended and used for different purposes.

Dimitris Poulopoulos
Towards Data Science
6 min readMar 16, 2020

--

Image by Ronile from Pixabay

Format errors, in the best-case scenario, can break an automated data processing pipeline. In the worst case, they introduce logical errors in downstream analytical tasks that are difficult to debug…

--

--

Machine Learning Engineer. I talk about AI, MLOps, and Python programming. More about me: www.dimpo.me