The world’s leading publication for data science, AI, and ML professionals.

Data Visualization Hack -Lessons from FiveThirtyEight Graphs

Create Professional Looking Graphs Using Only the Matplotlib Package

PYTHON. DATA VISUALIZATION. CHARTS

Photo by Clay Banks on Unsplash
Photo by Clay Banks on Unsplash

INTRODUCTION

FiveThirtyEight (sometimes written as 538), is a website that analyzes Data on poll topics such as politics, economics, and sports. It takes its name from the number of electors in the United States electoral college.

But one of the things that contribute to their popularity is the effectiveness of their visualization in relaying poll results.

We can learn a thing or two in improving our visualization to make it more professional-looking and captivating.

We will refer to the graphs created by FiveThirtyEight as FTE graphs for the rest of the article.

In this article, we will transform this plain-looking graph:

Image by Author: Original Matplotlib graph generated by a one-liner code
Image by Author: Original Matplotlib graph generated by a one-liner code

to one which looks like an FTE graph:

Image by Author: Final/Target graph
Image by Author: Final/Target graph

So, without further ado, let us jump into our graph makeover.

DATASET – MOVIE BUDGET

I am a big movie fan and so it is not surprising that I found the dataset regarding movie budgets interesting.

Assume for example that we want to examine how the budget of the different genres for movies changes across the years. For this, let’s use the dataset we got from Kaggle.

Examining the budget across the years might reveal the changing preferences and complexity of producing a movie in a particular genre.

STEP-BY-STEP GRAPHICAL MAKEOVER

PRELIMINARIES

To begin, let us import the necessary packages to accomplish this.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.style as style
%matplotlib inline

IMPORTING THE DATASET

df = pd.read_csv('datamovies.csv', encoding='latin-1')
df.head()

DATA PREPROCESSING

# Data preprocessing
trial = df.groupby(['year', 'genre'])['budget'].sum().reset_index()
trial.head()
trial = trial.pivot(index="year", columns="genre").dropna(axis = 1, how="any").reset_index()

DEFAULT GRAPH

The default graph produced by matplotlib can easily be generated using the following code:

initial_graph = trial.plot(x ='year', y = 'budget')
Image by Author
Image by Author

With this as our base graph, let us improve the elements, one by one, to produce our FTE-looking graph above.

Note: While we make the adjustments incrementally, ideally the entire code is all in one cell so the adjustments can apply to the plot object.

STEP 1 – CHANGE THE STYLE

Matplotlib comes with built-in styles that provide some default on the layout of the graph that it produces.

To view the available styles:

style.available
#To start, using the set style allows us to easily elevate the level of our visualization
style.use('fivethirtyeight')

Upon using the ‘fivethirtyeight‘ style, (which by the way, you need to call only once), we can call on the same graph using our original code, adding the figsize argument:

fte_graph = trial.plot(x ='year', 
           y = 'budget', 
           figsize=(12,8))
Image by Author
Image by Author

Using the built-in style instantly provides improvements from the base graph of matplotlib. Some of these are as follows:

  • background-color
  • grid lines
  • font
  • line thickness
  • removed top and right spines

We can examine an ideal FTE graph through an example:

Photo by FiveThirtyEight - Typical FTE graph
Photo by FiveThirtyEight – Typical FTE graph

To bring us closer to the style of an FTE graph, we still need the following components:

  • Title
  • Subtitle
  • Signature Bar
  • Replacing the Legend with Inline Labels

Other minor adjustments for the graph, may include the following:

  • Font Size for Tick Labels
  • Major Tick Labels of the Y-axis
  • Removing the X-axis Label
  • Placing a Bold Line at y=0
  • Choosing An Appropriate Color Scheme

STEP 2 – STORE THE GRAPH IN A VARIABLE

Storing the graph inside a variable makes it easier for us to access their attributes and to apply some functions to the object.

fte_graph = trial.plot(x ='year', 
           y = 'budget', 
           figsize=(12,8))

NOTE: GO FOR THE MINOR ADJUSTMENTS FIRST

It is important to go with the minor adjustments first before the Title, Subtitle, Signature bar, and Legend. As will be detailed later, these elements require a few manual adjustments and it will be best that the minor adjustments are in place to minimize our work.

STEP 3 – X-TICKS AND Y-TICKS FORMAT

Before we begin with the manual adjustments, it may be helpful to review the terms on the anatomy of a matplotlib as we will be using them to call for adjustments.

The photo was taken from matplotlib.org
The photo was taken from matplotlib.org

Modify the major tick size and label size through the following code:

#Accessing the tick parameters and modifying the size
fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
# Customizing the tick labels of the y-axis
fte_graph.set_yticklabels(labels = [-1000000000, 0,  '$1 Billion  ', '$2 Billion  ', '$3 Billion  ', '$4 Billion  ']);

We want our labels to have some space from the tick marks so, in our labels, we place some whitespace. In my code above, I used two spacebars to achieve my desired space.

STEP 4 – BOLDING THE LINE AT Y=0

# Generate a bolded horizontal line at y = 0
fte_graph.axhline(y = 0, color = 'black', linewidth = 2, alpha = .7)
Image by the Author
Image by the Author

STEP 5 – ADDING AN EXTRA VERTICAL LINE

Add an extra vertical line by tweaking the range of the x-axis, since the major ticks of the x-axis are 5, setting xlimits to 1984 and 2016, provides some much-needed space in our graph, particularly so that our tick labels do not seem too compressed.

fte_graph.set_xlim(left = 1984, right = 2016)

STEP 6 – REMOVE X-AXIS LABEL

This step can be done by setting the _setvisible method of the x.axis.label object to false.

#Remove the label of the x-axis
fte_graph.xaxis.label.set_visible(False)

STEP 7 – SIGNATURE BAR

The signature bar has the following characteristics:

  • Positioned at the bottom bar
  • The signature bar has a dark grey background
  • The author’s name is written in the bar
  • The Source of data is included below
  • The text color is matching the same color of the ‘fivethirtyeight‘ style

Adding a signature bar requires a bit of effort on our part. However, once we’re done, it will set our visuals apart from the rest and would certainly make it look more professional.

The background parameter for the text method of the graph will accomplish highlighting the entire text. This is what we need to produce the signature "bar". However, this background will only highlight the length of the text. Thus, to produce the alignment that we want, this must mean that we will utilize whitespaces in the text.

So to summarize the important steps for the signature bar, we need the following:

  • text method with the whitespaces, done through trial and error, to align the signature with the tick labels we have created earlier;
  • the placement of the signature bar, ideally below your lowest y-value, is done through trial and error as well.
  • It is important to note that the vertical alignment of the title, tick labels, and signature bar is key to FTE graphs. As such, the x-values are chosen to align the signature bar with the tick-labels.
# The signature bar
fte_graph.text(x = 1979.8, y = -1000000000,
    s = ' ©Francis Adrian Viernes                                                          Source: Data Collected by TMDB and GroupLens Accessed via Kaggle ',
               fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey');
Image by the Author
Image by the Author

STEP 8 – TITLE AND SUBTITLE

As with what we have done with the placement of the signature bar, the title and subtitle components are ‘text’ elements. The placement will have to be done through trial and error of the x- and y- coordinates of the graph. The title and subtitle are found on top of the graph so ideally, the y-coordinates of these texts exceed that of the highest y-value in your dataset.

Because we are concern about the vertical alignment of our title and subtitle with our signature bar, we use an x-value close to our signature bar(but not the exact value to account for the margins/padding of text objects).

#Code to find the maximum y-value
trial.to_numpy().max()

Since the maximum value of our y is around $4.79B, we can try to check if placing the title at 5.2 Billion looks good. If it does, then we find a value below it to place the subtitle.

#Title and Subtitle Placement
fte_graph.text(x = 1980, y = 5200000000, s = "Movie Budget Across Genres Through the Years",
               fontsize = 26, weight = 'bold', alpha = .75);
fte_graph.text(x = 1980, y = 4900000000,
               s = 'A Peak Into Which Movie Genre Has Become More Expensive To Produce Within The Last 30 Years',
              fontsize = 18, alpha = .85);
Image by the Author
Image by the Author

Seems to look fine. So let’s keep it.

STEP 9 – CHOOSE AN APPROPRIATE COLOR SCHEME

Technically, this can be done way earlier into the steps and will not alter any of the adjustments that we have already done.

Borrowing from the ideas of the base article we are following, and from this source, choosing colors that are suitable for color-blind readers not only enhances accessibility but is also a good graphic-design practice. A code for the RGB has been prepared by our original article source and luckily for us, it has the same number as our genres above.

# Colorblind-friendly colors
colors = [[0,0,0], [230/255,159/255,0], [86/255,180/255,233/255], [0,158/255,115/255],
          [240/255,228/255,66/255], [0,114/255,178/255], [213/255,94/255,0], [204/255,121/255,167/255]]
fte_graph = trial.plot(x ='year', 
           y = 'budget', 
           figsize=(12,8), 
           color= colors,
           legend= False) #instead of choosing a colormap, we handpick our colors
fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
fte_graph.set_yticklabels(labels = [-1000000000, 0,  '$1 Billion  ', '$2 Billion  ', '$3 Billion  ', '$4 Billion  ']);
fte_graph.axhline(y = 0, color = 'black', linewidth = 2, alpha = .7)
fte_graph.set_xlim(left = 1984, right = 2016)
fte_graph.xaxis.label.set_visible(False)
#Signature Bar
fte_graph.text(x = 1979.8, y = -1000000000,
    s = ' ©Francis Adrian Viernes                                                          Source: Data Collected by TMDB and GroupLens Accessed via Kaggle ',
               fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey');
#Title and Subtitle Placement
fte_graph.text(x = 1980, y = 5200000000, s = "Movie Budget Across Genres Through the Years",
               fontsize = 26, weight = 'bold', alpha = .75);
fte_graph.text(x = 1980, y = 4900000000,
               s = 'A Peak Into Which Movie Genre Has Become More Expensive To Produce Within The Last 30 Years',
              fontsize = 18, alpha = .85);
Image by the Author
Image by the Author

STEP 10 – REPLACING THE LEGEND WITH INLINE LABELS

As with the signature bar, title, and subtitle, this portion involves trial and error for the placement of the text.

Instead of a legend, it is now common and an accepted way of saving real estate in graphs, to place texts that simply identify lines or bars in charts.

To begin, let’s remove the legend by incorporating it into the main code.

fte_graph = trial.plot(x ='year', 
           y = 'budget', 
           figsize=(12,8), 
           color= colors,
           legend= False) #instead of choosing a colormap, we handpick our colors
fte_graph = trial.plot(x ='year', 
           y = 'budget', 
           figsize=(12,8), 
           color= colors,
           legend= False) #instead of choosing a colormap, we handpick our colors
fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
fte_graph.set_yticklabels(labels = [-1000000000, 0,  '$1 Billion  ', '$2 Billion  ', '$3 Billion  ', '$4 Billion  ']);
fte_graph.axhline(y = 0, color = 'black', linewidth = 2, alpha = .7)
fte_graph.set_xlim(left = 1984, right = 2016)
fte_graph.xaxis.label.set_visible(False)
#Signature Bar
fte_graph.text(x = 1979.8, y = -1000000000,
    s = ' ©Francis Adrian Viernes                                                          Source: Data Collected by TMDB and GroupLens Accessed via Kaggle ',
               fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey');
#Title and Subtitle Placement
fte_graph.text(x = 1980, y = 5200000000, s = "Movie Budget Across Genres Through the Years",
               fontsize = 26, weight = 'bold', alpha = .75);
fte_graph.text(x = 1980, y = 4900000000,
               s = 'A Peak Into Which Movie Genre Has Become More Expensive To Produce Within The Last 30 Years',
              fontsize = 18, alpha = .85);
Image by the Author
Image by the Author

Finally, we manually place texts that identify the lines with their specific genre.

#Colored Labels Indentifying the Line
fte_graph.text(x = 2010, y = 3500000000, s = 'Action', color = colors[0], weight = 'bold',fontsize=16, rotation = 42,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2015, y = 600000000, s = 'Adventure', color = colors[1], weight = 'bold',fontsize=16, rotation = 52,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2014, y = 1200000000, s = 'Animation', color = colors[2], weight = 'bold',fontsize=16, rotation = 52,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2016, y = 500000000, s = 'Biography', color = colors[3], weight = 'bold',fontsize=16, rotation = 52,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2004, y = 1400000000, s = 'Comedy', color = colors[4], weight = 'bold',fontsize=16, rotation = -58,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2016, y = 100000000, s = 'Crime', color = colors[5], weight = 'bold',fontsize=16, rotation = 52,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2002.8, y = 800000000, s = 'Drama', color = colors[6], weight = 'bold',fontsize=16, rotation = 65,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2016, y = -200000000, s = 'Horror', color = colors[7], weight = 'bold',fontsize=16, rotation = 52,
              backgroundcolor = '#f0f0f0');

As one can see, an additional argument we utilized for this purpose is the ‘rotation’ parameter to properly place our labels.

After all these steps, we finally have the following graph:

Image by the Author
Image by the Author

Here’s the full code for your reference:

# Colorblind-friendly colors
colors = [[0,0,0], [230/255,159/255,0], [86/255,180/255,233/255], [0,158/255,115/255],
          [240/255,228/255,66/255], [0,114/255,178/255], [213/255,94/255,0], [204/255,121/255,167/255]]
fte_graph = trial.plot(x ='year', 
           y = 'budget', 
           figsize=(12,8), 
           color= colors,
           legend= False) #instead of choosing a colormap, we handpick our colors
fte_graph.tick_params(axis = 'both', which = 'major', labelsize = 18)
fte_graph.set_yticklabels(labels = [-1000000000, 0,  '$1 Billion  ', '$2 Billion  ', '$3 Billion  ', '$4 Billion  ']);
fte_graph.axhline(y = 0, color = 'black', linewidth = 2, alpha = .7)
fte_graph.set_xlim(left = 1984, right = 2016)
fte_graph.xaxis.label.set_visible(False)
#Signature Bar
fte_graph.text(x = 1979.8, y = -1000000000,
    s = ' ©Francis Adrian Viernes                                                          Source: Data Collected by TMDB and GroupLens Accessed via Kaggle ',
               fontsize = 14, color = '#f0f0f0', backgroundcolor = 'grey');
#Title and Subtitle Placement
fte_graph.text(x = 1980, y = 5200000000, s = "Movie Budget Across Genres Through the Years",
               fontsize = 26, weight = 'bold', alpha = .75);
fte_graph.text(x = 1980, y = 4900000000,
               s = 'A Peak Into Which Movie Genre Has Become More Expensive To Produce Within The Last 30 Years',
              fontsize = 18, alpha = .85);
#Colored Labels Indentifying the Line
fte_graph.text(x = 2010, y = 3500000000, s = 'Action', color = colors[0], weight = 'bold',fontsize=16, rotation = 42,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2015, y = 600000000, s = 'Adventure', color = colors[1], weight = 'bold',fontsize=16, rotation = 52,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2014, y = 1200000000, s = 'Animation', color = colors[2], weight = 'bold',fontsize=16, rotation = 52,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2016, y = 500000000, s = 'Biography', color = colors[3], weight = 'bold',fontsize=16, rotation = 52,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2004, y = 1400000000, s = 'Comedy', color = colors[4], weight = 'bold',fontsize=16, rotation = -58,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2016, y = 100000000, s = 'Crime', color = colors[5], weight = 'bold',fontsize=16, rotation = 52,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2002.8, y = 800000000, s = 'Drama', color = colors[6], weight = 'bold',fontsize=16, rotation = 65,
              backgroundcolor = '#f0f0f0');
fte_graph.text(x = 2016, y = -200000000, s = 'Horror', color = colors[7], weight = 'bold',fontsize=16, rotation = 52,
              backgroundcolor = '#f0f0f0');

FINAL REMARKS

Making a professional-looking graph, while requiring a bit of effort on the code side, ultimately pays off in captivating our audience.

Following the steps above and with the ultimate goal of minimizing our work, one should deal with the minor adjustments and elements that will unlikely change with further modifications. What this refers to our the:

  • the background or style
  • tick marks
  • color schemes and
  • other minor adjustments such as a bold horizontal line on certain data points.

Text elements require a bit of precision and coordination with the alignment and should ideally be done last.

Lastly, do not be afraid to mix-up some elements and try it with other graphs and datasets. You can even get creative and insert images here.

Let me know what you think. Full code on my Github page.

REFERENCES

Making 538 Plots

Colorblind-friendly Color Schemes


Related Articles