Donut plot for data analysis
In this story, we demonstrate how we draw donut plots from complex excel sheets with python tools.
As usual, our work starts with data, just like some people’s breakfast consists of donuts. There is no relationship between food and data, except that the donut chart has a doughnut-shaped figure.
Firstly, we have an excel file that records all sales information of an industry department from 2018 to 2020. The department established in 2018 has just experienced covid-19 year in China. Luckily, it has survived and celebrates the starting of the new year 2021.
Explanation of original data
Now we can look back on what happened to the department in the last three years. The excel composed of 3 sheets, contains the daily sale amount (column E) and the total weight of goods (column F). The receivers of the goods (column C) are actually the companies that pay the money. The product has 3 types (column D) depending on its components, FDY and DTY. generally, There’re 3 combinations of these raw materials, namely FDY-FDY, DTY-DTY, and hybrid FDY-DTY. FAW (column B) determinates the thickness of the product since the finished goods are grey cloth, a basic product in the textile industry.
Donut plot with Pandas and Matplotlib
import pandas as pd
df2018=pd.read_excel("outbound_with_company.xlsx",sheet_name='2018',header=0)
df2019=pd.read_excel("outbound_with_company.xlsx",sheet_name='2019',header=0)
df2020=pd.read_excel("outbound_with_company.xlsx",sheet_name='2020',header=0)
We write each excel sheet into a data frame. The data frame has the same columns as the initial data. We can regroup the data in our minds. If we would like to explore the relationships between orders and costumers, in other words, the numerical proportion between column C and column E (or F), we can regroup the data frame by a groupby operation, which uses the function pandas.DataFrame.groupby.
Code example:
group_2018_2 = df2018.groupby('company')
print(group_2018_2.size())
import matplotlib.pyplot as plt
from matplotlib import cm
fig, ax = plt.subplots(figsize=(6, 6), subplot_kw=dict(aspect="equal"))
cs_customer= cm.get_cmap('viridis')(np.linspace(0, 1, 5))
component = group_2018_company.index
data = group_2018_company['weight']
wedges, texts = ax.pie(data, wedgeprops=dict(width=0.3), startangle=90,colors=cs_customer)
plt.legend(wedges, component, loc="center",fontsize=12)
ax.set_title("customers in 2018",fontdict={'fontsize': 16, 'fontweight': 'bold'})
fig.tight_layout()
plt.savefig('donutplot2.png',dpi=100, format='png', bbox_inches='tight')
plt.show()
From the donut plot, we can clearly see the contribution of each customer to sales. The Customer with ID003 has made the greatest contribution in the year 2018.
Similarly, we group the data and compute operations on other groups, such as ‘type’, ‘FAW’. Therewith we have obtained three donut plots in 2018.
This excel records 3 years’ sales in the department. That means we can get 9 donut🍩 charts.
In the next release, we will explain how to build a beautiful annual report by DIY design with Matplotlib.
One page of the report looks like below:
All the code and files (png and excel) have been submitted in Github.
Story review
So far, I have written a series of stories based on the same excel file, for which I have posted other related stories, which could help you understand data and data analysis through the real data and case.