Word Clouds in Python: Comprehensive Example

Visualizing Text by Frequency of Words — Economist Style

Andrew Hershy
Towards Data Science
2 min readAug 8, 2019

--

Image from Wikipedia

This is a simple exercise visualizing my latest issue of The Economist in a word cloud. I wasn’t in the mood to actually read the issue, so I thought this would be a fun way to digest the information quickly.

Table of Contents

  • PDF to Text Conversion
  • Text Preprocessing
  • Word Cloud

PDF to Text Conversion

The first step in this process was to convert the original PDF document to text. I did this using pdfminer.six (for python3).

The printed text looked like this; it takes a long time to scroll through the whole thing. Too long to read for a busy data scientist like myself. That’s why we’re doing this wordcloud, baby.

Text Preprocessing

Now we need to clean the data so there isn’t punctuation or insignificant words in the image:

WordCloud

Time to put this together! The .png image for the “mask” is the same Economist logo in the beginning of this article. You’ll see it looks familiar in the final image.

The Final Result:

--

--