Preprocessing Textual Data

Using Cleantext for cleaning text dataset

Published in

Towards Data Science

3 min readJul 17, 2021

If you’ve ever worked on textual datasets, you must be aware of the garbage that comes with text data. In order to clean this data, we perform certain preprocessing which helps in cleaning and manipulating the data. Preprocessing is an important step because it helps in passing the correct data to the model so that the model can work according to the requirements.

There are certain python libraries that are helpful in performing the preprocessing of the text dataset. One such library is Cleantext, which is an open-source python module i.e, use to clean and preprocess the text data to create a normalized text representation.

In this article, we will explore Cleantext and its different functionalities.

Let’s get started…

Installing required libraries

We will start by installing a Cleantext library by using pip. The command given below will do that.

!pip install cleantext

Importing required libraries

In this step, we will import the required libraries for cleaning and preprocessing the dataset. Cleantext requires NLTK at the backend so we will import NLTK also.

Preprocessing Textual Data

Using Cleantext for cleaning text dataset

Installing required libraries

Importing required libraries

Create an account to read the full story.

Published in Towards Data Science

Written by Himanshu Sharma

Responses (1)