A Beginner’s Guide to Plotting ‘FiveThirtyEight Like’ Visualizations

Here I’ll show you how I got to reproduce a visualization from FiveThirtyEight’s article Every Guest Jon Stewart Ever Had On ‘The Daily Show’.

Carlos Gutierrez
Towards Data Science

--

Source: FiveThirtyEight’s article “Who Got to Be On ‘The Daily Show’?”

You may have already done visualizations using Matplotlib and Seaborn, yet you may want to improve the aesthetics of your plots. You can get the dataset’s csv file here from FiveThirtyEight’s Github.

So, let’s get started…

First, jump into your preferred coding environment. I prefer using the Jupyter Notebook (in case you don’t know about it or want help downloading it, here’s a tutorial I found). Also, another good choice is Google Colaboratory).

Once you have your environment ready load the appropriate libraries, read the file and display the first five rows.

First five rows of dataset

Let’s now rename the ‘YEAR’ and ‘Raw_Guest_List’ to reduce our typing since we’ll be using these soon.

Now we need to condense all guests’ occupations in the “Group” column into three categories. This will serve to plot our three lines in the graph. To do this we define a function that loops through the “Group” column and creates a new “Occupations” column.

See added ‘Occupation’ column combining occupations into three categories

Next, create a table with the percentage of guests according to occupations each year. For this we’ll use the Pandas’ crosstab function which will help us to simplify our calculation.

Table with percentage of guests by occupation each year

The crosstab function above by default would count the number of guests per category each year. Yet notice the included “normalize” argument. This adjusts our calculation to give us a proportion instead. Moreover, setting normalize to “index” applies this normalization over each row and not over columns. Then we can multiply by 100 (*100) to turn these proportions into percentages.

Next, if we would like to clean up our table we can also drop the “Other” column, which will not use in our graph.

Same table without ‘Other’ column

To make things easier when plotting, we’ll make a list of all the years in our table.

It’s time do what we really wanted to do all along!

Here’s a gist of code that will provide us with our rough draft. Customizing line color and line weight to match the graph from the article…

Here’s our initial plot

Increase the length of grid lines along the y-axis and add a horizontal line at the baseline of the plot.

Grid line along y-axis increased and horizontal line at baseline added

Adjust the x and y labels displayed, again adjusting font color and size.

Reduced number of grid lines and changed x & y axis labels

Set the graph title and subtitle (not using the traditional plt.title() since it limits text placement). Notice that the x and y arguments below determine the placement of the text according to the x and y coordinates.

Title & subtitle aligned with ‘100%’ label on plot

Add text labels for each line in the plot with customized font size and color.

Adding labels here takes a lot of playing with x & y coordinates

And our last touch, a custom signature box at the bottom of our graph. Here you have to get the right spacing in the text (see text within s argument ) to get it to fit the width of the graph.

Voila!… Our finalized ‘FiveThirtyEight like’ visualization

That’s it.

Doesn’t it feel good to produce nice and clean informative plots like this one on your own? I have to admit it feels good to me :)

You’re now ready to take advantage of the data offered by FiveThirtyEight. Go! It’s time for you to look at some interesting articles and try to reproduce other plots on your own.

--

--