The world’s leading publication for data science, AI, and ML professionals.

How to Produce an Animated Bar Plot in Plotly using Python

Wrangle your raw dataset to produce an Animated Bar Plot

Image Courtesy of Author, Stephen Fordham
Image Courtesy of Author, Stephen Fordham

Plotting Antibiotic prescribing rates in US counties

This tutorial details how to transform raw data into an animated barplot using the Plotly library in Python. The dataset used in this tutorial is titled:

Potentially Avoidable Antibiotic Prescribing observed and risk-adjusted rates for child Medicaid enrollees by provider county beginning in 2010‘.

The dataset can be found via Data world. Specifically, we will aim to plot an animated barplot for antibiotic prescribing rates/100 visits in counties across the US from 2010–16.

Preparing the Data

To begin, we first import the pandas library and plotly express. In addition, when dealing with a dataset it can occasionally be useful to see all the rows and columns. To achieve this, we can set both the display max rows and display max columns to None as shown below. It is important to consider however that larger datasets will take loader to load with this setting applied. Finally, we can read in the dataset which I have labelled ‘ potentially-avoidable-antibiotic-prescribing-rates.csv‘.

Entries for each year

We can first determine the number of counties in the dataset. This will then enable us to decipher how many years of information exist for each county. Here, there are 63 unique counties, and the dataframe has 441 rows in total, meaning each county have 7 years worth of data available.

Figure 2: Unorganised Data Format
Figure 2: Unorganised Data Format

Getting the Data into the right shape for an Animated Bar plot

In order to plot an animated bar plot, we need to configure the data into a different format than it is currently in. We need each county in order of year in a single dataframe. To illustrate this, the dataframe image snippet below shows the format required.

Simple and concise data-wrangling can alter the format from Figure 2 above to Figure 3 below.

Figure 3: Correctly formatted Data. Counties in year order in a single dataframe
Figure 3: Correctly formatted Data. Counties in year order in a single dataframe

Data Wrangling

To reorganise the data, we begin by creating a keys list from 0 through to 63 using a list comprehension. We can then instantiate an empty dictionary, iterate through both the keys list and the unique counties list. Through each loop we create a new dataframe for each county sorted with earliest year first. Finally, we can add the new dataframe to the all_dataframes dictionary which is assigned a unique key value on each loop.

We can then iterate through all the dataframes using the unique key, and append to list of all the dataframes, termed master_df.

All these dataframes can finally be concatenated into a single dataframe using the pd.concat function from pandas. The output dataframe from this function call is assigned the name df_all.

The concise code to achieve these steps is shown below

Plotting a sub-selection of Counties

We can either plot the data for all 63 counties, or choose a sub-selection for plotting. Here we will sample 10 unique counties to demonstrate.

To do this, we select the column with the counties and call the sample method with the parameter replace set to the boolean True. We then call the values attribute which returns an array before converting to a list using the tolist() method. We can now see the list of 10 unique selected counties.

We use these 10 counties to filter the dataframe using the pandas isin method. This returns a plot_df dataframe. The final steps involve knowing the min and max values of the column we intend to plot on the y-axis, in our case, ‘Observed Prescribing Rate per 100 Visits’, which we will use for the range_y parameter on our figure object.

Animated bar plot

The correctly formatted data, combined with our 10 counties can now be plotted. We first pass the dataframe and the columns for the x and y-axes, and the Provider County to the color parameter. We set the animation_frame to the year column and the animation_group to the Provider County column. We then use the range determined above for the y-axis to set sensible limits for the y-axis, and give the figure an informative title.

Now we have our animated plot for our 10 counties!


Related Articles