Data for Change

Python Bar Chart Race Animation: COVID-19 Cases

Tazki Anida Asrul
Towards Data Science
5 min readSep 18, 2020

--

Bar Chart Race GIF

COVID-19 has crushed many countries for over eight months. A million cases have been confirmed, and the number keeps getting higher every day.

Thanks to Google who has provided the Corona dataset for us, publicly and FREE. It makes it possible for us to do our own analysis related to this pandemic, for example, creating bar chart race animation. The data are stored in the Google Cloud Platform and can be accessed here.

Collect the Data

First of all, find the BigQuery table named covid19_open_datain bigquery-public-data dataset.

COVID-19 Dataset. Image by author

There are 45 columns in total, which presents the information of Corona cases for each country (and region) day by day. You can specifically select the data you want to download, based on what kind of analysis you’re gonna do. For example, I ran the SQL below to retrieve the date, country, subregion, and the number of cases that happened.

SELECT
date,
country_code,
country_name,
subregion1_name,
COALESCE(new_confirmed,0) AS new_confirmed,
COALESCE(new_deceased,0) AS new_deceased,
COALESCE(cumulative_confirmed,0) AS cumulative_confirmed,
COALESCE(cumulative_deceased,0) AS cumulative_deceased,
COALESCE(cumulative_tested,0) AS cumulative_tested
FROM
`bigquery-public-data.covid19_open_data.covid19_open_data`
Query Results. Image by author

You can save the results to .CSV file and open it later by using Python.

Prepare the Data

Now, let’s transform the data in Python. The first thing to do is open your Jupyter Notebook, then import the necessary packages to transform and visualize the data. After that, open the .csv file and save it as a DataFrame object.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.animation as animation
import numpy as np
import matplotlib.colors as mcimport colorsys
from IPython.display import HTML
from datetime import datetime
from random import randint
df = pd.read_csv ('covid19-20200917.csv')

To utilize the data, we need to understand the characteristic of the data first. You can explore it by seeing the data type and looking at the value of each column.

Image by author

From the results above, we know that there are NULL values in country_code and subregion1_name columns.

In my case, the aim of the analysis is to create a bar chart about total confirmed cases of each country, from the beginning of 2020 until the last August. We can rely on three main columns from the dataset; date, country_name, and cumulative_confirmed. There is no NULL value for those columns, so we can continue to the next step.

By using pandas, summarize the total of cumulative_confirmed for each country per day. Then, check the results. For example, I used 2020-08-31 as a date parameter

df = pd.DataFrame(df.groupby(['country_name','date'])['cumulative_confirmed'].sum()).reset_index()current_date = '2020-08-31'
dff = df[df['date'].eq(current_date)].sort_values(by='cumulative_confirmed', ascending=False).head(15)
15 Country with The Most Confirmed Cases in August. Image by author

Visualize the Data

Then, we can start to visualize it by using Matplotlib.

fig, ax = plt.subplots(figsize=(15, 8))
sns.set(style="darkgrid")
ax.barh(dff['country_name'], dff['cumulative_confirmed'])
Image by author

To create bar chart race animation, we need to group each country with a certain color. So, no matter the position of the country is, it still can be tracked by looking at its color.

The main code is in .ipynb format. This specific function is separated into a .py file due to aesthetic needs.

The main function of this method is to give colors for each country, which is determined randomly. After defining the way to transform the color, we can continue to draw the bar chart.

The main code is in .ipynb format. This specific function is separated into a .py file due to aesthetic needs.

In the method above, we sort the data in our data frame, so only 15 countries with the most confirmed cases for each date will present in a bar chart. Then, we give the color for the bar and its attributes. For each date, we extract the month name to be emphasized on the right side of the bar.

Bar Chart Design

The chart above is the result of the draw_barchart method if we used 2020-08-31 as a date parameter. You can adjust the code to decorate the bar chart based on your taste.

After finishing the bar chart design, all we need to do is to define the dates that will be in the range of bar chart animation.

dates=pd.Series(pd.to_datetime(df[‘date’].unique()))dates=dates.where(dates < ‘2020–09–01’)dates.dropna(inplace=True)
dates = dates.astype(str)

In my case, I excluded the date before September 2020. The range of dates would be like this:

Range of dates. Image by author

Finally, we reach the magic part, animation! We can create an animation by utilizing matplotlib.animation.

import matplotlib.animation as animation
from IPython.display import HTML
fig, ax = plt.subplots(figsize=(15, 8 ))
animator = animation.FuncAnimation(fig, draw_barchart, frames=dates)
HTML(animator.to_jshtml())

And, voila! Here is the result.

Bar Chart Race

To save the animation in GIF format, we can use this simple line.

animator.save('covid_til_august.gif', writer='Pillow')

Conclusion

There are many ways to play with data since there are a lot of tools that will support us. BigQuery helps us to get and query the data, Pandas to transform and aggregate the data, and also Matplotlib to visualize the data. Still, there’s a lot of room to explore.

Now you can create your own bar chart and do your own analysis. You can see the full code of the Jupyter Notebooks file here. Happy trying, and stay healthy!

Reference

[1] P. Vardhan, Bar Chart Race in Python with Matplotlib (2019), https://medium.com/r/?url=https%3A%2F%2Fpratapvardhan.com%2Fblog%2Fbar-chart-race-python-matplotlib%2F

[2] G. Berardi, Create a Smooth Bar Chart Race with Python (2020), https://www.datasciencecoffee.com/2020-smooth-bar-chart-race/

--

--