Preprocessing Textual Data

Using Cleantext for cleaning text dataset

Himanshu Sharma
Towards Data Science
Photo by Sincerely Media on Unsplash

If you’ve ever worked on textual datasets, you must be aware of the garbage that comes with text data. In order to clean this data, we perform certain preprocessing which helps in cleaning and manipulating the data. Preprocessing is an important step because it helps in passing the correct data to the model so that the model can work according to the requirements.

There are certain python libraries that are helpful in performing the preprocessing of the text dataset. One such library is Cleantext, which is an open-source python module i.e, use to clean and preprocess the text data to create a normalized text representation.

In this article, we will explore Cleantext and its different functionalities.

Let’s get started…

Installing required libraries

We will start by installing a Cleantext library by using pip. The command given below will do that.

!pip install cleantext

Importing required libraries

In this step, we will import the required libraries for cleaning and preprocessing the dataset. Cleantext requires NLTK at the backend so we will import NLTK also.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Responses (1)

What are your thoughts?