Lean Six Sigma with Python — Kruskal Wallis Test

Learn how to perform the Kruskal Wallis Test in Lean Six Sigma with Python and evaluate the impact of training on warehouse operators’ productivity.

Samir Saci

Published in

Towards Data Science

6 min readAug 9, 2021

Lean Six Sigma with Python — Kruskal Wallis Test — Warehouse Operators Training — (Image by Author)

Do you want to improve the productivity of your warehouse operators but don’t know where to start?

Have you heard about the Kruskal Wallis Test but don’t know how to perform it?

In this article, we will introduce you to the Lean Six Sigma methodology and show you how to use Python to perform statistical tests like the Kruskal Wallis Test to evaluate the impact of training on your warehouse operators’ productivity.

Introduction

Lean Six Sigma (LSS) is a method based on a stepwise approach to process improvements.

This approach usually follows 5 steps (Define, Measure, Analyze, Improve and Control) for improving existing process problems with unknown causes.

In this article, we will explore how Python can replace Minitab (Software widely used by LSS experts) in the Analysis step to test hypotheses and understand what could improve the performance metrics of a specific process.

💌 New articles straight in your inbox for free: Newsletter
📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

SUMMARY
I. Problem Statement
Can we improve the operators' productivity by giving them a training designed by R&D team?
II. Data Analysis
1. Exploratory Data Analysis
Analysis with Python sample data from experiment with few operators 
2. Analysis of Variance (ANOVA)
Verify the hypothesis that training impacts productivity
ANOVA assumptions are not verified
3. Kruskal-Wallis test
Confirm that the hypothesis can be generalized
III. Conclusion

If you prefer to watch, you can have a look at the video version of this article

I. Problem Statement

1. Scenario

You are the continuous improvement Manager of a Distribution Center (DC) for an iconic Luxury Maison focusing on Fashion, Fragrances and Watches.

The warehouse receives garments that require final assembling and value-added service (VAS) during the inbound process.

Example of Job-Shop with 4 lines for Value Added Services in a Distribution Center of a Luxury Brand — (CAD Model by Author)

For each dress received your operators need to print a label in the local language and perform label sewing.

In this article, we will focus on the improvement of label sewing productivity.

Labels are distributed to the operators in batches of 30 labels. The productivity is calculated based on the time (in seconds) needed to finish a batch.

4 workstations for label sewing — (CAD Model by Author)

2. Impact of training your workforce

With the support of the R&D team, you designed training for the VAS operators to improve their productivity and reduce quality issues.

Question
Does the training have a positive impact on the productivity of operators?

Hypothesis
The training has a positive impact on the productivity of VAS operators.

Experiment
Randomly select operators and measure the time per batch (Time to finish a batch of 30 labels in seconds) to build a sample of 56 records.

Data-driven analysis to test our hypothesis — (Image by Author)

Samir Saci

Data Science blog focusing on Warehousing, Transportation, Data Visualization and Robotic Process Automation with…

samirsaci.com

II. Data Analysis

You can find the full code in this Github repository: Link

1. Exploratory Data Analysis

You can download the results of this experiment in this CSV file to run the whole code on your computer (here).

Experiment results: productivity of each operator (sec/batch)

56 records
35 records of operators without training
21 records of operators with training

Box Plot

Based on the sample data, the median and the mean are considerably lower for the operators who had training.

Hypothesis
The training reduces the average time per batch.

Code

Minitab
Menu Graph > Box Plot > Simple > 1 Y with Groups

2. Analysis of Variance (ANOVA)

In this scenario, we want to check if the training (Variable X) impacts the total time per batch (Variable Y).

Because X is a categorical variable (Training = Yes/No) and Y is numerical, the appropriate method is ANOVA.

ANOVA is a statistical method used to check if we can generalize the mean difference in the sample data to the entire population.

Step 1: Calculate the p-value

Source: Training
ddof:   11
ddof:   245.267
F:      17.1066
p-unc:  0.000151308
p:      20.173692p-value is below 5%

Code

Minitab
Menu Stats > ANOVA > One-Way

Step 2: Validate the assumptions of ANOVA

Based on the p-value, we know that the difference in the mean is real and not due to random fluctuation.

However, before jumping to a conclusion, we need to check that the ANOVA assumptions are satisfied

Residuals are naturally distributed

Distribution of Residuals — (Image by Author)

Answer: No

There are no outliers or irregularities

Answer: No

Conclusion
ANOVA requirements are not met.

We need another method to confirm that the training impacts operators' productivity.

Code

Minitab
Menu Stats > ANOVA > One-Way > Graphs > Four in one

3. Kruskal-Wallis test

If your sample data fails to meet ANOVA requirements, you can use the Kruskal-Wallis Test to check if the mean difference is due to random fluctuation.

statistic = 54.99
pvalue = 1.205e-13
p-value is below 5%

Conclusion
The p-value is below 5%, so we can conclude that the difference in means is statistically significant.

We can confirm that the training has a positive impact on the productivity of the operators.

Code

Minitab
Menu Stats > Non-parametric > Kruskal Wallis > Graphs > Four in one

If you are interested in other Lean Six Sigma Methodology applications using Python, you can look at the articles below.

Lean Six Sigma with Python — Chi-Squared Test

Perform a Chi-Squared Test to explain a shortage of drivers impacting your transportation network

s-saci95.medium.com

Lean Six Sigma with Python — Logistic Regression

Replace Minitab with Python to perform a Logistic Regression to estimate the minimum bonus needed to reach 75% of a…

towardsdatascience.com

III. Conclusion

This data-driven approach gave you enough elements to convince your management to invest in workforce training.

You brought enough insights with moderate experimentation by using statistics to generalise patterns from sample data.

Generative AI: Smart GPT Agent for LSS Statistical Tests

As the trend of large language models (LLMs) took off at the end of 2022, I started experimenting with designing enhanced analytics products to solve operational issues.

The initial prototype was a smart LangChain agent connected to a TMS.

Supply Chain Control Tower Agent with LangChain SQL Agent [Article Link] — (Image by Author)

The idea was to test the capabilities of such agents to answer operational questions using context and data.

The performance is very impressive.

What if we create a Lean Six Sigma super agent?

Example of the Lean Six Sigma Super Agent — (Image by Author)

We can deploy custom GPTs with Python Scripts of Lean Six Sigma Tools and documentation about Lean Six Sigma, warehousing and transportation processes.

Users would describe their problems
Upload datasets
The agent would select the right statistical test and provide an answer

💡 For more details,

Create GPTs to Automate Supply Chain Analytics

“The Supply Chain Analyst” is a Custom ChatGPT’s “GPT” that performs Pareto & ABC Analysis using sales data.

s-saci95.medium.com

Leveraging LLMs with LangChain for Supply Chain Analytics — A Control Tower Powered by GPT

Build an Automated Supply Chain Control Tower with a LangChain SQL Agent Connected to the Database of a Transportation…

towardsdatascience.com

About Me

Let’s connect on Linkedin and Twitter; I am a Supply Chain Engineer using data analytics to improve logistics operations and reduce costs.

If interested in Data Analytics and Supply Chain, look at my website.

Samir Saci | Data Science & Productivity

A technical blog focusing on Data Science, Personal Productivity, Automation, Operations Research and Sustainable…

samirsaci.com

References

ANOVA Analysis of Variation, Ted Hessing, Six Sigma Study Guide, link
Optimize Luxury Warehouse Value-Added Services Scheduling with Python, Towards Data Science, Samir Saci

Lean Six Sigma with Python — Kruskal Wallis Test

Learn how to perform the Kruskal Wallis Test in Lean Six Sigma with Python and evaluate the impact of training on warehouse operators’ productivity.

Introduction

I. Problem Statement

1. Scenario

2. Impact of training your workforce

Samir Saci

Data Science blog focusing on Warehousing, Transportation, Data Visualization and Robotic Process Automation with…

II. Data Analysis

1. Exploratory Data Analysis

2. Analysis of Variance (ANOVA)

3. Kruskal-Wallis test

Lean Six Sigma with Python — Chi-Squared Test

Perform a Chi-Squared Test to explain a shortage of drivers impacting your transportation network

Lean Six Sigma with Python — Logistic Regression

Replace Minitab with Python to perform a Logistic Regression to estimate the minimum bonus needed to reach 75% of a…

III. Conclusion

Generative AI: Smart GPT Agent for LSS Statistical Tests

Create GPTs to Automate Supply Chain Analytics

“The Supply Chain Analyst” is a Custom ChatGPT’s “GPT” that performs Pareto & ABC Analysis using sales data.

Leveraging LLMs with LangChain for Supply Chain Analytics — A Control Tower Powered by GPT

Build an Automated Supply Chain Control Tower with a LangChain SQL Agent Connected to the Database of a Transportation…

About Me

Samir Saci | Data Science & Productivity

A technical blog focusing on Data Science, Personal Productivity, Automation, Operations Research and Sustainable…

References

Written by Samir Saci