This story is after an interview with a factory owner, with the help of Pandas to reveal the factory’s face hidden under data

1. Background
Two months ago, I was in the biggest textile collecting and distributing center in Asia. The City has developed in recent few decades depending on its main industry – Textile. Therewith, I visited a small family factory in the suburbs. The owner of the factory has been managing this factory for more than 30 years. However, he has resisted recording everything that happened in the factory on paper. A modern computer has occurred in the factory just for the last 3 years. The paperless office makes factory management more modern, nevertheless, the data are still not clear enough.

The outbound records of the textile factory in the last 3 years are saved in an Excel file, which contains too much redundant information. To communicate information clearly and efficiently to readers, I did the following works to realize the data visualization.
With the help of the few steps below, we can explain the happenings of the factory intuitively. People who have never been there can also have a quick overview of how this textile factory ran in China in the last few years.
2. Data availability statements
The factory owner is very nice and friendly. A few days later, I received the factory’s outbound delivery data from him. The factory is actually a family sweatshop in the textile industry, whose introduction is in my last story, the link also given at the end of this story.
The main product of the factory is the grey fabric, which is rolled and then transported by truck. Almost every day hundreds of grey fabric rolls are delivered to their customers, who will continue the next chemical treatment process. After going through many industries, the finished textiles land on the market. That is another story that we can hear from others.

In this story, we just pay attention to data, not industry products. So please don’t think about where the beautiful clothes come from.
After getting the factory owner’s support and agreement, I’m allowed to publish this story and share the outbound delivery documents in my GitHub. Additionally, I can give my opinions about the factory after analyzing data.
3. Source data explanation
It’s an Excel file that records the daily delivery orders for its customers.

The above screenshot is what we can see as opening the file. Column C relates to customer information so I have to mosaic this part. The two long sheets tell us what happened on each day in the year 2018 and 2019. FAW and type describe what the delivered products look like. However, the essential character of an order is its quantity, which means goods-exchange. For example, the second row in the screenshot records that there are 100 rolls of grey fabric weighed 2500 Kg delivered on 02/28/2018.
What we should know in advance is that both the factory daily productivity and the weight of each grey fabric roll stay almost steady. This can help us understand the data easily. Actually, the weight of the grey fabric roll depends on its type and FAW. And it can vary from 25 Kg/roll to 30 Kg/roll. It concerns more about professional textile knowledge. To simplify our analysis in this story, I don’t take the light deviation in the weight of fabric rolls into consideration.
4. From excel to data frame with pandas
Pandas provides a function to import data directly from Excel files, which is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import pyplot
import os
import numpy as np # import necessary packages as usual
Since we have two sheets about the year 2018 and 2019, respectively, we can merge two sheets into one data frame with the following code.
df2018= pd.read_excel("outbound.xlsx",sheet_name='2018',header=0)
df2019= pd.read_excel("outbound.xlsx",sheet_name='2019',header=0)
frames =[df2018, df2019]
df= pd.concat(frames)
df
We can use the to_datetime()
function to create Timestamps from strings in a wide variety of date/time formats, so that the Column ‘date’ can be clearly understood by the computer.
df.columns = ['date','FAW','company','type','rolls','weight']
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')

Here we illustrate df.head(2) and df.tail(2). The current data frame includes 675 days’ delivery records. It’s obviously not easy for humans to read. So we need to resample Pandas time-series data with the resample() function.
print(df.resample('M').sum().to_period('M'))

Unfortunately, as we said before, FAW is an attribution of the product, which can not be added together. Therewith, we should pay attention to the correct use of the function.
The correct code:
print(df.resample('M').agg(dict(rolls='sum', weight='sum')).to_period('M')) #Specify the columns to be summed

Similarly, we can also visualize the outbound data specific to each day by a shortcode:
new_df=df.resample('D').agg(dict(rolls='sum', weight='sum')).to_period('D')

5. Data plot
Finally, we can display the data in an elegant way.
fig, ax = plt.subplots(figsize=(18,8))
ax= month_df['weight'].plot(style="k--", label="Series")
ax.set_xlabel("date", fontsize=12)
ax.set_ylabel("outbound/Kg", fontsize=12)
ax.set_title('monthly outbound in year 2018 and 2019',fontdict={'fontsize': 14, 'fontweight': 'bold'})
plt.savefig('barplot.png',dpi=100, format='png', bbox_inches='tight')
plt.show()

There are two obvious troughs on the data chart. This just happened to be what the factory owner said, the factory takes annual factory vacation at the end of the Chinese calendar.
With the same principle, we can plot the daily outbound delivery. Therewith, I share two interesting links about setting the shape of lines and points: https://matplotlib.org/gallery/lines_bars_and_markers/line_styles_reference.html
https://matplotlib.org/3.1.1/api/markers_api.html
ax= interpolated['weight'].plot(style="k--", label="Series")

ax= interpolated['weight'].plot(style="ks")

Through these intuitive plots, we can see the factory’s daily outbound delivery is not as steady as its production plan. As for within a normal working month, the monthly delivery is much more steady, especially between March and December. However, the Chinese new year usually happens between January and February, which means the factory has a long vacation and no delivery record.
6. Summary
In this story, we have done the basic tasks after we get new data. The origin data is usually saved in an Excel file, so I show the whole process using Pandas to extract information.
In the end, we visualize the preprocessed data in a proper diagram so that everyone can view the delivery status of the factory without filtering complicated information in the excel table.
The plot-style is pretty personal. There are many different styles available. Therewith reading more related documents helps us go a long way in Data Science.
7. references (including the previous stories)
A short interview about a Chinese sweatshop in the textile industry
How to draw a bar graph for your scientific paper with python