The world’s leading publication for data science, AI, and ML professionals.

Create Bump Charts With Matplotlib

Explore changes in rank over time using only Matlpotlib

Photo by blueberry Maki on Unsplash
Photo by blueberry Maki on Unsplash

There is nothing so stable as change. Bob Dylan

When I was a teenager I loved to check the most popular songs on Billboard, even though my taste was often different from what Billboard presented to me. Usually, my favorite songs ended up failing to reach the top positions. Despite that, it was a great source of new and good songs for me. I even liked to check which songs were popular in the past. I found out, for instance, that on the week I was born, on August 1987, _I Still Haven’t Found What I’m Looking For_ was the number one song on the Hot Billboard 100!

Humans are always comparing, evaluating, and ranking all aspects of life. What are the best football teams in 2023? Who was the best tennis player in 2022 and what was the most used programming language on GitHub last year? We want to know what’s trending right now. But as with everything in life, ranks change all the time.

In this lesson, you will learn how to show changes in rank with basic Matplotlib, with no need for additional libraries. As an example, you will use data published by the Octoverse Report 2022, which analyzed the most popular programming languages in 2022.

1. What is a bump chart?

A bump chart is similar to a line plot but focused on exploring changes in rank over time. Imagine, for example, that each line in the figure below represents the rank of a singer’s popularity. The line and its color represent the singer, the x-axis represents the year, and, the y-axis, the rank.

Image created by the Author using Matplotlib
Image created by the Author using Matplotlib

2. Our competitors: the programming languages

According to Octoverse, in 2022, programmers used around 500 languages to develop software on GitHub. JavaScript was the most used language followed by Python, the language we will use to build our bump chart.

The report also revealed that the Hashicorp Configuration Language (HCL) was the fastest-growing language on GitHub reflecting the expansion of cloud infrastructure. Rust and Typescript were the second and third in growth, respectively.

There are several rankings using different data and methods to estimate the position of each language. Another ranking is Stack Overflow’s 2020 Developer Survey, which presents similar but not identical results. This post will use the Octoverse data as an example.

To make reproducibility easier, the data is code generated and stored in a list of dictionaries, as shown below.

years_list = list(range(2014,2023,2))

list_programming = [

{
    'Name' : ["Javascript" for i in range(5)],
    'Year' : years_list,
    'Rank' : [1,1,1,1,1]
},

{
    'Name' : ["Python" for i in range(5)],
    'Year' : years_list,
    'Rank' : [4,3,3,2,2]
},

{
    'Name' : ["Java" for i in range(5)],
    'Year' : years_list,
    'Rank' : [2,2,2,3,3]
},

{
    'Name' : ["Typescript" for i in range(5)],
    'Year' : years_list,
    'Rank' : [10,10,7,4,4]
},

{
    'Name' : ["C#" for i in range(5)],
    'Year' : years_list,
    'Rank' : [8,6,6,5,5]
},

{
    'Name' : ["C++" for i in range(5)],
    'Year' : years_list,
    'Rank' : [6,5,5,7,6]
},

{
    'Name' : ["PHP" for i in range(5)],
    'Year' : years_list,
    'Rank' : [3,4,4,6,7]
},

{
    'Name' : ["Shell" for i in range(5)],
    'Year' : years_list,
    'Rank' : [9,9,9,9,8]
},

{
    'Name' : ["C" for i in range(5)],
    'Year' : years_list,
    'Rank' : [7,8,7,7,9]
},

{
    'Name' : ["Ruby" for i in range(5)],
    'Year' : years_list,
    'Rank' : [5,7,10,10,10]
}

]

3. Matplotlib subplots method

There are several ways you can create a plot with Matplotlib, but to get flexibility, it is recommended to use subplots(). This method creates two objects: one object of the class Figureand one of the class Axes. The Figure object will be the container of your plot, while the Axes object will be the plot itself.

Image created by the Author
Image created by the Author

The code below loads the necessary libraries and creates the two objects just mentioned.

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

4. Setting plot size

In Matplotlib you may change the size of your plot using plt.rcParams["figure.figsize"]. We will set it to be 12 inches wide and 6 inches high.

plt.rcParams["figure.figsize"] = (12,6)

5. Calling the plot method for each programming language

For each dictionary in our list, we will call the ax plot method specifying the years on the x-axis and the ranks on the y-axis. Moreover, you can choose the style of the marker and line with "o-" indicating we would like a line with a dot as the marker. Note that the marker face color was set to white, meaning the dot is filled with white.

The result is almost what we want, but further adjustments are needed.

for element in list_programming:
  ax.plot(element["Year"], 
          element["Rank"], 
          "o-",                       # format of marker / format of line
          markerfacecolor="white")
Image created by the Author using Matplotlib
Image created by the Author using Matplotlib

6. Inverting the y-axis and setting axis ticks

It would be nice to have the number one language at the top of the chart. Besides that, we would like all the rank numbers to be shown on the y-axis.

We could go about it by using the command plt.gca().invert_yaxis(). Additionally, we can set the y ticks by passing a NumPy array with the values to plt.yticks(). A NumPy array can be created by np.arange().

plt.gca().invert_yaxis()
plt.yticks(np.arange(1, 11, 1))

7. Labelling the lines

We need to identify to which programming language each of the lines corresponds. To achieve that, we can use the _ax annotate_method. The first parameter it receives is the text we would like to annotate. We will use list_programming["Name"][0]to get the language names.

The xy _ parameter is the point we wish to annotate. In our case, it is the end of each line. The xytext parameter is the point where we would like to add our text. Note that xytext will be almost the same as `xy_ but a bit more to the right on the x-axis. Finally,va`refers to the vertical alignment.

ax.annotate(element["Name"][0], 
              xy=(2022, element["Rank"][4]), 
              xytext=(2022.2,element["Rank"][4]), 
              va="center")

8. Changing linewidth in Matplotlib

The line indicating the path of each language is relatively thin and we could increase its width with the linewidthparameter inside the plot method.

9. Clearing the plot

To make the plot clearer, the frame of the plot could be suppressed. To do that, note that each Axesobject has 4 spines. One spine is one side of the plot frame. We can iterate them with a for loop and set their visibility attribute to False. Check out all of these adjustments below.

for element in list_programming:
  ax.plot(element["Year"], 
          element["Rank"], 
          "o-", # format of marker / format of line
          markerfacecolor="white",
          linewidth=3)
  ax.annotate(element["Name"][0], 
              xy=(2022, element["Rank"][4]), 
              xytext=(2022.2,element["Rank"][4]), 
              va="center")

plt.gca().invert_yaxis()
plt.yticks(np.arange(1, 11, 1))

for spine in ax.spines.values():
    spine.set_visible(False)
Image created by the Author using Matplotlib
Image created by the Author using Matplotlib

A lot better isn’t it?

This bump chart does not require any additional library. Moreover, Matplotlib allows you to customize it in many ways! In this post, I show further recommendations for plotting compelling visualizations with Matplotlib.

10. The top programming languages in 2022

Now we can have a clear picture of how programming languages evolved over the last decade.

JavaScript has maintained the top position since 2014. According to Berkeley Boot Camps, JavaScript’s popularity is explained because most web browsers utilize it. In 2014, Python was the fourth most used language and since then the language has grown in popularity. Today it is the second most used language on GitHub. Finally, Java has lost some popularity but remains the third most used language.

Conclusion

In this post, you learned to show changes in rank with a basic Matplotlib graph. To achieve that, there is no need for additional libraries, all you need is to understand Matplotlib objects and how to customize them to show your data.


If you enjoyed this post, follow me to know more about Data Visualization!


Related Articles