Donut plot for data analysis

In this story, we demonstrate how we draw donut plots from complex excel sheets with python tools.

Yefeng Xia
Towards Data Science

--

As usual, our work starts with data, just like some people’s breakfast consists of donuts. There is no relationship between food and data, except that the donut chart has a doughnut-shaped figure.

Firstly, we have an excel file that records all sales information of an industry department from 2018 to 2020. The department established in 2018 has just experienced covid-19 year in China. Luckily, it has survived and celebrates the starting of the new year 2021.

Explanation of original data

Now we can look back on what happened to the department in the last three years. The excel composed of 3 sheets, contains the daily sale amount (column E) and the total weight of goods (column F). The receivers of the goods (column C) are actually the companies that pay the money. The product has 3 types (column D) depending on its components, FDY and DTY. generally, There’re 3 combinations of these raw materials, namely FDY-FDY, DTY-DTY, and hybrid FDY-DTY. FAW (column B) determinates the thickness of the product since the finished goods are grey cloth, a basic product in the textile industry.

image by Author: a screenshot of the excel file

Donut plot with Pandas and Matplotlib

import pandas as pd
df2018=pd.read_excel("outbound_with_company.xlsx",sheet_name='2018',header=0)
df2019=pd.read_excel("outbound_with_company.xlsx",sheet_name='2019',header=0)
df2020=pd.read_excel("outbound_with_company.xlsx",sheet_name='2020',header=0)

We write each excel sheet into a data frame. The data frame has the same columns as the initial data. We can regroup the data in our minds. If we would like to explore the relationships between orders and costumers, in other words, the numerical proportion between column C and column E (or F), we can regroup the data frame by a groupby operation, which uses the function pandas.DataFrame.groupby.

Code example:

group_2018_2 = df2018.groupby('company')
print(group_2018_2.size())
image by Author: code output
import matplotlib.pyplot as plt
from matplotlib import cm
fig, ax = plt.subplots(figsize=(6, 6), subplot_kw=dict(aspect="equal"))
cs_customer= cm.get_cmap('viridis')(np.linspace(0, 1, 5))
component = group_2018_company.index
data = group_2018_company['weight']

wedges, texts = ax.pie(data, wedgeprops=dict(width=0.3), startangle=90,colors=cs_customer)

plt.legend(wedges, component, loc="center",fontsize=12)

ax.set_title("customers in 2018",fontdict={'fontsize': 16, 'fontweight': 'bold'})

fig.tight_layout()
plt.savefig('donutplot2.png',dpi=100, format='png', bbox_inches='tight')
plt.show()
image by Author: output of plt.show(), the donut chart

From the donut plot, we can clearly see the contribution of each customer to sales. The Customer with ID003 has made the greatest contribution in the year 2018.

Similarly, we group the data and compute operations on other groups, such as ‘type’, ‘FAW’. Therewith we have obtained three donut plots in 2018.

image by Author

This excel records 3 years’ sales in the department. That means we can get 9 donut🍩 charts.

In the next release, we will explain how to build a beautiful annual report by DIY design with Matplotlib.

One page of the report looks like below:

Image by Author: annual report built with Matplotlib

All the code and files (png and excel) have been submitted in Github.

Story review

So far, I have written a series of stories based on the same excel file, for which I have posted other related stories, which could help you understand data and data analysis through the real data and case.

--

--