
Plotting Antibiotic prescribing rates in US counties
This tutorial details how to transform raw data into an animated barplot using the Plotly library in Python. The dataset used in this tutorial is titled:
‘Potentially Avoidable Antibiotic Prescribing observed and risk-adjusted rates for child Medicaid enrollees by provider county beginning in 2010‘.
The dataset can be found via Data world. Specifically, we will aim to plot an animated barplot for antibiotic prescribing rates/100 visits in counties across the US from 2010–16.
Preparing the Data
To begin, we first import the pandas library and plotly express. In addition, when dealing with a dataset it can occasionally be useful to see all the rows and columns. To achieve this, we can set both the display max rows and display max columns to None as shown below. It is important to consider however that larger datasets will take loader to load with this setting applied. Finally, we can read in the dataset which I have labelled ‘ potentially-avoidable-antibiotic-prescribing-rates.csv‘.
Entries for each year
We can first determine the number of counties in the dataset. This will then enable us to decipher how many years of information exist for each county. Here, there are 63 unique counties, and the dataframe has 441 rows in total, meaning each county have 7 years worth of data available.

Getting the Data into the right shape for an Animated Bar plot
In order to plot an animated bar plot, we need to configure the data into a different format than it is currently in. We need each county in order of year in a single dataframe. To illustrate this, the dataframe image snippet below shows the format required.
Simple and concise data-wrangling can alter the format from Figure 2 above to Figure 3 below.

Data Wrangling
To reorganise the data, we begin by creating a keys list from 0 through to 63 using a list comprehension. We can then instantiate an empty dictionary, iterate through both the keys list and the unique counties list. Through each loop we create a new dataframe for each county sorted with earliest year first. Finally, we can add the new dataframe to the all_dataframes dictionary which is assigned a unique key value on each loop.
We can then iterate through all the dataframes using the unique key, and append to list of all the dataframes, termed master_df.
All these dataframes can finally be concatenated into a single dataframe using the pd.concat function from pandas. The output dataframe from this function call is assigned the name df_all.

The concise code to achieve these steps is shown below
Plotting a sub-selection of Counties
We can either plot the data for all 63 counties, or choose a sub-selection for plotting. Here we will sample 10 unique counties to demonstrate.
To do this, we select the column with the counties and call the sample method with the parameter replace set to the boolean True. We then call the values attribute which returns an array before converting to a list using the tolist() method. We can now see the list of 10 unique selected counties.
We use these 10 counties to filter the dataframe using the pandas isin method. This returns a plot_df dataframe. The final steps involve knowing the min and max values of the column we intend to plot on the y-axis, in our case, ‘Observed Prescribing Rate per 100 Visits’, which we will use for the range_y parameter on our figure object.

Animated bar plot
The correctly formatted data, combined with our 10 counties can now be plotted. We first pass the dataframe and the columns for the x and y-axes, and the Provider County to the color parameter. We set the animation_frame to the year column and the animation_group to the Provider County column. We then use the range determined above for the y-axis to set sensible limits for the y-axis, and give the figure an informative title.
Now we have our animated plot for our 10 counties!