Preprocessing text in Python

A step towards building a sentiment classifier

Zolzaya Luvsandorj
Towards Data Science
12 min readSep 1, 2020

--

This post is the second of three sequential posts on steps to build a sentiment classifier. Following our exploratory text analysis in the first post, it’s time to preprocess our text data. Simply put, preprocessing text data is to do a series of operations to convert the text into a tabular numeric data. In this post, we will look at 3 ways with varying complexity to preprocess text to tf-idf matrix as preparation for a model. If you are unsure what tf-idf is, this post explains with a simple example.

--

--