Studying The Pandemic With a Single Visualization Using Plotly Dash

Using interactive visualizations to better understand how variables such as income and education affect a country’s reception of the pandemic.

Thiago Rodrigues
Towards Data Science

--

The Final Product of This Article

Introduction

The coronavirus pandemic has been a very common topic for data science posts on platforms like Medium ever since it first became a thing. Visualizing data related to important topics like this in the form of graphs is usually a great way to get a better grasp of the situation.

However, when studying a topic, especially if it’s as complicated as this one, one figure is usually not enough, and you might want to plot the same figure across different target variables as well. So, instead of doing that, in this post, we’ll be going over how to use Plotly Dash to create an interactive dashboard containing all the information you’ll want to show. You can follow along through Google Colab.

The Data

The datasets we’ll be using to build our dashboard on the pandemic and how a country’s income and education level might have impacted how they dealt come from the COVID-19 Worldometer Daily Snapshots and The World Bank’s 2020 Human Capital Index, both of which are available on Kaggle.

As to keep things simple, we’ll be using the columns on the total cases, deaths, tests, and recovered cases from the first dataset and the income group and expected years of education columns from the second one.

The Human Capital Index Dataset (Only Select Columns)
The COVID-19 Worldometer Daily Snapshots Dataset (Only Select Columns)

Preparing The Data

To get our final dataset, which we’ll be using to make our interactive dashboard, we’ll first need to work out some issues.

The COVID-19 dataset has 2 instances, each for a separate date. As we only want to work with the most recent data, we’ll be dropping everything that comes before. We’ll also be converting the names of the columns to snake case, which should make working with the data easier.

The country names for the same country can differ between datasets. To fix that we’ll be using the Pycountry library to map them all to the same format. Then, we can finally merge both datasets using the Pandas merge() method.

The Final Data Frame

Installing What We Need

To finally get into making our dashboard, we’ll need the Dash libraries as well as Plotly, since that’s what we’ll be using to plot the graphs.

Plotly can be installed with the following line:

pip install Plotly

The Dash libraries can be installed as follows:

pip install dash
pip install dash-html-components
pip install dash-core-components
pip install dash-table

Building The Dashboard

When creating a Dash app, the first thing you’ll have to worry about is setting up your application’s layout, which is done in HTML. Luckily, Dash provides us with functions that act as HTML tags, allowing us to create Divs, Headers, and other components through the dash_html_components library.

You can even use CSS through the style argument of the methods. To use it, you pass it a dictionary with the keys corresponding to the CSS properties in camelCase and the dictionary’s values corresponding to the values you want to give to each property.

Aside from the HTML, you can also add graphs and various types of input components from the dash_core_components libraries. The graph components will be our plots. If you don’t plan on making the dashboard interactive, you can simply attribute a plot made in Plotly to them via the figure argument of the methods. But in our case, we’ll instead define their ids so we can use them interactively.

Defining The Layout for Our Dashboard

With our layout out of the way, we can get down to making the plots and having them receive data from the input components, which are, in this case, a Dropdown and Slider components from the core components library.

We associate our graphs to our input components via the callback functions. When using one of these, we list our output and input variables through the Output and Input functions from the dash.dependencies module, which takes in the unique ids we gave them when building the layout, as seen above.

The code will then execute the following function using the input variables and link what it returns to the output variable, which is, in this case, our graphs.

When defining our function to plot the graphs, we can simply plot them as we would normally but replacing the variables we want to be interactive with the input variables the function takes in.

Building Our Interactive Plots Using Callbacks

And then, finally, we write the code to run our dashboard and get the final product of all the code we wrote so far.

if __name__ == '__main__':
app.run_server(use_reloader=False)

Conclusion

Looking at our final dashboard, which I have hosted on Heroku for ease of viewing, we can notice that the cases of COVID-19 are higher in countries with a higher income, which makes sense given that they’re more economically active and tend to have more people in constant circulation, and a higher education level doesn’t outweigh that.

And, more importantly, we have learned how to make visualizations that can more effectively reveal those insights by gathering all the relevant information into a single place.

--

--

Writing occasionally about whatever piques my interest, but mostly about A.I. and computer science/engineering