
In this article, we’ll discuss how to plot 100% stacked bar and column charts in Python using Matplotlib. We’ll use the Netflix Movies and TV Shows dataset that’s downloaded from Tableau Public sample datasets (https://public.tableau.com/en-us/s/resources and go to Sample Data). This dataset consists of TV shows and movies available on Netflix as of 2019 and is sourced by Tableau from Kaggle and the updated version of the dataset can be found on Kaggle. However, for this article, we’ll use the older version available in Tableau sample datasets. Below is the snapshot of the dataset (transposed for better visibility).

In this article, we’ll plot stacked bar and column charts showing the proportion of the type of shows released in each _release_year for the years 2016 to 2019. ** In the code below, we’ll select the records/rows with release_yea**_r between 2016 and 2019 and preprocess the data.

We’ll create a cross tabulation that shows the proportion of the type of shows for each year. We’ll include _release_year in the index and type_ of show in the columns. Using the normalize=True argument gives us the index-wise proportion of the data.

We’ll create a similar cross tabulation as the one above, but this time we’ll retain the counts as they are without converting them into proportions. This cross tabulation is used to display the data labels on the plot, which we’ll see later in this article.

100% stacked column chart
Now, we’ll create a stacked column plot showing the proportion of the type of shows each year. We’ll use the cross tabulation having the proportions (that we created earlier) to plot the chart.

Now we’ll add the proportion data labels to our plot. We’ll create a nested loop in which the first loop iterates through the index of the cross tabulation having the proportions. The second loop iterates through the values by each index. We’ll look at the code and its output before diving deeper.

In the above code, the first loop enumerates and iterates through the index of the cross tabulation having proportions. We’ll use the values of the enumeration (0, 1, 2, 3) as the x-position of the data labels. Let’s look at what the first loop returns.

In the second loop, we’ll iterate through the values in each index of the cross tabulation having proportions. Let’s look at what the second loop returns.

We can see the inner loop iterating through the proportions of each index one by one. The variable proportion is the proportion in the cross tabulation that we created earlier. We’ve also used it as the y-position of the data label (which is fine while plotting counts but incorrect while plotting proportions).
In the above plot, we see that the data labels are misaligned and are not being displayed at the right position. We can correct the x-position of the labels by subtracting a value (0.17 in this case) from the x-position such that the label moves to the left. We’ll add a _y_loc_ variable to the inner loop which is nothing but the cumulative sum of the proportions in an index. This acts as the correct y-position of the data labels and places a label at the top-most point of its respective bar. We’ll see the code and the output below.

We may align the data labels such that they appear in the center of each bar by modifying the y-position of the data labels in the plt.text() function as shown in the code below.

We may also add the counts of each bar by including them in the inner loop and modifying the string argument (s) in the plt.text() function as shown below.

100% stacked bar chart
We can create a 100% Stacked Bar Chart by slightly modifying the code we created earlier. We must change the kind of the plot from ‘bar’ to ‘barh’. Then swap the x and y labels and swap the x and y positions of the data labels in plt.text() function. Everything else stays the same. We’ll look at the code below.

This brings the article to an end. We’ve discussed how to create 100% stacked bar and column charts, add data labels to them, properly align and format those data labels. Since, the data is represented in the form of proportion/relative frequency instead of just counts/frequency (which may at times be misleading) 100% stacked charts provide a more reliable view of the data, especially while comparing across groups.
Know more about my work at https://ksvmuralidhar.in/