Text Classification with NLP: Tf-Idf vs Word2Vec vs BERT

Preprocessing, Model Design, Evaluation, Explainability for Bag-of-Words, Word Embedding, Language models

Mauro Di Pietro
Towards Data Science
22 min readJul 18, 2020

--

Summary

In this article, using NLP and Python, I will explain 3 different strategies for text multiclass classification: the old-fashioned Bag-of-Words (with Tf-Idf ), the famous Word Embedding (with Word2Vec), and the cutting edge Language models (with BERT).

--

--