Word Clouds in Python: Comprehensive Example

Visualizing Text by Frequency of Words — Economist Style

Published in

Towards Data Science

2 min readAug 8, 2019

This is a simple exercise visualizing my latest issue of The Economist in a word cloud. I wasn’t in the mood to actually read the issue, so I thought this would be a fun way to digest the information quickly.

PDF to Text Conversion
Text Preprocessing
Word Cloud

PDF to Text Conversion

The first step in this process was to convert the original PDF document to text. I did this using pdfminer.six (for python3).

The printed text looked like this; it takes a long time to scroll through the whole thing. Too long to read for a busy data scientist like myself. That’s why we’re doing this wordcloud, baby.

Text Preprocessing

Now we need to clean the data so there isn’t punctuation or insignificant words in the image:

WordCloud

Time to put this together! The .png image for the “mask” is the same Economist logo in the beginning of this article. You’ll see it looks familiar in the final image.

Word Clouds in Python: Comprehensive Example

Visualizing Text by Frequency of Words — Economist Style

Table of Contents

PDF to Text Conversion

Text Preprocessing

WordCloud

The Final Result:

Written by Andrew Hershy