How to Use n-gram Models to Detect Format Errors in Datasets
A code-first approach on how language models can be extended and used for different purposes.
Published in
6 min readMar 16, 2020
Format errors, in the best-case scenario, can break an automated data processing pipeline. In the worst case, they introduce logical errors in downstream analytical tasks that are difficult to debug…