100% stacked charts in Python

Plotting 100% stacked bar and column charts using Matplotlib

KSV Muralidhar
Towards Data Science

--

Image by author

In this article, we’ll discuss how to plot 100% stacked bar and column charts in Python using Matplotlib. We’ll use the Netflix Movies and TV Shows dataset that’s downloaded from Tableau Public sample datasets (https://public.tableau.com/en-us/s/resources and go to Sample Data). This dataset consists of TV shows and movies available on Netflix as of 2019 and is sourced by Tableau from Kaggle and the updated version of the dataset can be found on Kaggle. However, for this article, we’ll use the older version available in Tableau sample datasets. Below is the snapshot of the dataset (transposed for better visibility).

Snapshot of the transposed dataset (Image by author)

In this article, we’ll plot stacked bar and column charts showing the proportion of the type of shows released in each release_year for the years 2016 to 2019. In the code below, we’ll select the records/rows with release_year between 2016 and 2019 and preprocess the data.

Image by author

We’ll create a cross tabulation that shows the proportion of the type of shows for each year. We’ll include release_year in the index and type of show in the columns. Using the normalize=True argument gives us the index-wise proportion of the data.

Year-wise proportion of type of shows (Image by author)

We’ll create a similar cross tabulation as the one above, but this time we’ll retain the counts as they are without converting them into proportions. This cross tabulation is used to display the data labels on the plot, which we’ll see later in this article.

Year-wise count of type of shows (Image by author)

100% stacked column chart

Now, we’ll create a stacked column plot showing the proportion of the type of shows each year. We’ll use the cross tabulation having the proportions (that we created earlier) to plot the chart.

Image by author

Now we’ll add the proportion data labels to our plot. We’ll create a nested loop in which the first loop iterates through the index of the cross tabulation having the proportions. The second loop iterates through the values by each index. We’ll look at the code and its output before diving deeper.

Image by author

In the above code, the first loop enumerates and iterates through the index of the cross tabulation having proportions. We’ll use the values of the enumeration (0, 1, 2, 3) as the x-position of the data labels. Let’s look at what the first loop returns.

Image by author

In the second loop, we’ll iterate through the values in each index of the cross tabulation having proportions. Let’s look at what the second loop returns.

Image by author

We can see the inner loop iterating through the proportions of each index one by one. The variable proportion is the proportion in the cross tabulation that we created earlier. We’ve also used it as the y-position of the data label (which is fine while plotting counts but incorrect while plotting proportions).

In the above plot, we see that the data labels are misaligned and are not being displayed at the right position. We can correct the x-position of the labels by subtracting a value (0.17 in this case) from the x-position such that the label moves to the left. We’ll add a y_loc variable to the inner loop which is nothing but the cumulative sum of the proportions in an index. This acts as the correct y-position of the data labels and places a label at the top-most point of its respective bar. We’ll see the code and the output below.

Image by author

We may align the data labels such that they appear in the center of each bar by modifying the y-position of the data labels in the plt.text() function as shown in the code below.

Image by author

We may also add the counts of each bar by including them in the inner loop and modifying the string argument (s) in the plt.text() function as shown below.

Image by author

100% stacked bar chart

We can create a 100% stacked bar chart by slightly modifying the code we created earlier. We must change the kind of the plot from ‘bar’ to ‘barh’. Then swap the x and y labels and swap the x and y positions of the data labels in plt.text() function. Everything else stays the same. We’ll look at the code below.

Image by author

This brings the article to an end. We’ve discussed how to create 100% stacked bar and column charts, add data labels to them, properly align and format those data labels. Since, the data is represented in the form of proportion/relative frequency instead of just counts/frequency (which may at times be misleading) 100% stacked charts provide a more reliable view of the data, especially while comparing across groups.

Know more about my work at https://ksvmuralidhar.in/

--

--

Data Science | ML | DL | NLP | CV | Web scraping | Kaggler | Python | SQL | Excel VBA | Tableau | About Me: https://ksvmuralidhar.in/