Word Clouds in Python: Comprehensive Example
Visualizing Text by Frequency of Words — Economist Style
This is a simple exercise visualizing my latest issue of The Economist in a word cloud. I wasn’t in the mood to actually read the issue, so I thought this would be a fun way to digest the information quickly.
Table of Contents
- PDF to Text Conversion
- Text Preprocessing
- Word Cloud
PDF to Text Conversion
The first step in this process was to convert the original PDF document to text. I did this using pdfminer.six (for python3).
The printed text looked like this; it takes a long time to scroll through the whole thing. Too long to read for a busy data scientist like myself. That’s why we’re doing this wordcloud, baby.
Text Preprocessing
Now we need to clean the data so there isn’t punctuation or insignificant words in the image:
WordCloud
Time to put this together! The .png image for the “mask” is the same Economist logo in the beginning of this article. You’ll see it looks familiar in the final image.