The world’s leading publication for data science, AI, and ML professionals.

Central Limit Theorem for Process Improvement with Python

Estimate the workload for returns management assuming a normal distribution of the number of items per carton received from your stores.

Central Limit Theorem Framework - (Image by Author)
Central Limit Theorem Framework – (Image by Author)

Improve your returns management process with statistical analysis.

This article introduces the Central Limit Theorem and how it can be used to estimate workload and optimize workforce planning.

Learn how to calculate the probability of receiving less than 30 items per carton using Python and historical records.

Introduction

Returns management, often called reverse Logistics, manages returned items from retail locations in your distribution centre.

After the reception, products are sorted, organized, and inspected for quality. If they are in good condition, these products can be restocked in the warehouse and added to the inventory count waiting to be reordered.

In this article, we will see how the Central Limit Theorem can help us estimate the workload for the returns management process using a normal distribution based on the mean and the standard deviation of historical records.

SUMMARY
I. Scenario
Problem Statement
As the Inbound Manager of a multinational clothing retail company you are in charge of workforce planning for returns management.
Question
Can you estimate the probability to have less than 30 items per carton that you will receive every week?
II. Central Limit Theorem
1. Definition
2. Application
3. Probability to get <30 items per carton?
4. 95% probability to have less than k items per case?
III. Conclusion
1. Generative AI: Statistical Tests x GPT
Create a statistical tool super agent powered by GPT
2. Next Steps

Scenario

Problem Statement

You are the Inbound Manager of a multinational clothing retail company known for its fast-fashion clothing for men, women, teenagers, and children.

A major problem for you is the lack of visibility of your workload for the returns process.

Indeed, because of system limitations, you do not get advance shipping notice (ASN) before receiving returns from your stores.

a. You receive the cartons by pallets that you unload from the truck

Unloading Area— (Image by Author)
Unloading Area— (Image by Author)

b. You open the box and inspect the returned items

Quality Inspection Workstation - (Image by Author)
Quality Inspection Workstation – (Image by Author)

For each item (shirt, dress …), your operators need to perform the following:

  • Quality check to ensure that the product can be restocked
  • Relabelling
  • Re-packing

You know the productivity per item, and you would like to estimate the workload in hours based on the number of cases you will receive weekly.

Based on the historical data of the last 24 months, you have:

  • An average of 23 items per carton
  • A standard deviation of 7 items

Your team is usually sized to handle 30 items per case.

If it exceeds this threshold, you must hire temporary workers to meet your daily capacity target.

Question

Can you estimate the probability to have less than 30 items per carton that you will receive every week?


💡 Follow me on Medium for more articles related to 🏭 Supply Chain Analytics, 🌳 Sustainability and 🕜 Productivity.

You can find the full code in this Github repository,

GitHub – samirsaci/central-limit: Central Limit Theorem for Process Improvement with Python


Central Limit Theorem

The Central Limit Theorem establishes that when we add independent random variables, their normalized sum tends toward a normal distribution even when the original variables are not normally distributed.

Definition

To simplify the comprehension, let’s introduce some notations:

Notations - (Image by Author)
Notations – (Image by Author)

In our case, the total population is the entire scope of cartons received from the stores with a mean µ = 23 items per carton and a standard deviation of σ = 7 items per carton.

If you take n samples of cartons Xn (for instance, a sample can be a batch of cartons received at a certain date), we have the following

Equation - (Image by Author)
Equation – (Image by Author)

In other words, if we randomly measure the number of items per carton using n samples and assume that observations are independent and identically distributed (i.i.d.), the probability distribution of the sample means will closely approximate a normal distribution.

Note: To ensure that we have independent and identically distributed observations, we assume that the samples are built based on return batches coming from all stores in a scope covering 100% of the active SKU.

Application

We can then assume that the average number of items/case is following a normal distribution with a mean of 23 items per carton and a standard deviation of 7 cartons.

Population Normal Distribution - (Image by Author)
Population Normal Distribution – (Image by Author)

What is the probability to have less 30 items per carton?

Probability to get <30 items per carton?

Probability to have less than 30 items/carton is 84.13%
Population Normal Distribution - (Image by Author)
Population Normal Distribution – (Image by Author)

Code

4. 95% probability of having less than k items per case?

Your KPI target is to have at least 95% of the returns processed the same day.

How many items must you assume to size your team for handling 95% of the expected workload?

We have 95% of probability that X <= 34.51 items/carton
Population Normal Distribution - (Image by Author)
Population Normal Distribution – (Image by Author)

If you size your team based on 35 items/carton, you will, on average, reach 95% of your target.


Conclusion

Generative AI: Statistical Testing x GPT

Following the adoption of large language models (LLMs), I started to experiment with the design of a LangChain Agent connected to a TMS.

Supply Chain Control Tower Agent with LangChain SQL Agent [Article Link] - (Image by Author)
Supply Chain Control Tower Agent with LangChain SQL Agent [Article Link] – (Image by Author)

The performance is quite impressive; the agent can answer operational questions by autonomously querying a database of delivery shipments.

What if we create a Statistical Tests super agent?

Lean Six Sigma Super Agent - (Image by Author)
Lean Six Sigma Super Agent – (Image by Author)

The vision is to equip customs GPTs with

  • Python Scripts of Lean Six Sigma Tools
  • Context, articles and knowledge about LSS mathematical tools

Imagine you can help continuous improvement engineers with an agent to find the right test, perform it on datasets uploaded and provide answers.

For more information,

Create GPTs to Automate Supply Chain Analytics

Leveraging LLMs with LangChain for Supply Chain Analytics – A Control Tower Powered by GPT

Next Steps

This methodology allows you to size your team based on assumptions backed by powerful statistical tools.

If you are interested in learning about statistical tools to solve operational problems,

Lean Six Sigma with Python – Chi-Squared Test

Lean Six Sigma with Python – Logistic Regression

Statistical Sampling for Process Improvement using Python


This analysis can be performed several times a year, especially if the business is evolving (with more collections, e-commerce, or new store openings).

About Me

Let’s connect on Linkedin and Twitter, I am a Supply Chain Engineer using data analytics to improve logistics operations and reduce costs.

For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.

If you are interested in Data Analytics and Supply Chain, have a look at my website.

Samir Saci | Data Science & Productivity

💌 New articles straight in your inbox for free: Newsletter 📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet


Related Articles