The world’s leading publication for data science, AI, and ML professionals.

Product Segmentation for Retail with Python

A statistical methodology to segment your products based on turnover and demand variability

Use Statistics for Product Segmentation - (Image by Author)
Use Statistics for Product Segmentation – (Image by Author)

Product segmentation refers to the activity of grouping products that have similar characteristics and serve a similar market.

In Logistics, attention is mainly focused on sales volume distribution, demand variability and delivery lead time.

As a Data Scientist for a Retail company, how can you automate this analysis with Python?

You want to put efforts into managing products that have:

  • The highest contribution to your total turnover: ABC Analysis
  • The most unstable demand: Demand Variability

In this article, we will introduce simple statistical tools to combine ABC Analysis and Demand Variability with Python.

SUMMARY
I. Scenario
1. Problem Statement
2. Scope Analysis
3. Objective
II. Segmentation
ABC Analysis
Demand Stability: Coefficient of Variation
Normality Test
III. Conclusion
1. Automate the process with Business Intelligence
2. Inventory Management Rules

How to do product segmentation with Python?

Product Segmentation for Retail

You support the Operational Director of a local Distribution Center (DC) that delivers 10 Hypermarkets.

In her scope, there is the responsibility of

  • Preparation and delivery of replenishment orders from stores
  • Demand Planning and Inventory Management

What are the operational challenges?

Logistics Operations for Retail

This analysis will be based on the M5 Forecasting dataset of Walmart stores’ sales records.

We suppose that we only have the first-year data (d_1 to d_365):

  • 10 stores in 3 states (USA)
  • 1,878 unique SKU
  • 3 categories and 7 departments (sub-category)

Categories and departments do not impact your ordering, picking, or shipping processes except for the warehouse layout.

Code – Data Processing

What does impact your logistic performance?

Products RotationWhat are the references that are driving most of your sales?

  • Very Fast Movers: top 5% (Class A)
  • The following 15% of fast movers (Class B)
  • The remaining 80% of prolonged movers (Class C)

This classification will impact,

  • Warehouse Layout: Reduce Warehouse Space with the Pareto Principle using Python
  • Picking Process: Improve Warehouse Productivity using Order Batching with Python

Demand VariabilityHow stable is your customers’ demand?

  • Average Sales: µ
  • Standard Deviation:
  • Coefficient of Variation: CV = σ/µ

You may need more stable customer demand for SKUs with a high CV value, leading to workload peaks, forecasting complexity and stock-outs.

Code

  • Filter on the first year of sales for HOBBIES Skus
  • Calculate Mean, Standard deviation and CV of sales
  • Sorting (Descending) and Cumulative sales calculation for ABC analysis

    Now that you have computed the key indicators, let’s generate visualizations.


Methodologies of Product Segmentation

This analysis will be done for the SKU in the HOBBIES category.

How to perform ABC Analysis?

What are the references that are driving most of your sales?

ABC Analysis of HOBBIES SKU - (Image by Author)
ABC Analysis of HOBBIES SKU – (Image by Author)
Class A: the top 5%
- Number of SKU: 16
- Turnover (%): 25%
Class B: the following 15%
- Number of SKU: 48 
- Turnover (%): 31%
Class C: the 80% slow movers
- Number of SKU: 253 
- Turnover (%): 43%

In this example, the Pareto Law (20% of SKUs making 80% of the turnover) is not observed.

However, 80% of our portfolio still makes less than 50% of the sales.

Code

How stable is your customers’ demand?

Define Demand Stability with the Coefficient of Variation

From the Logistics Manager’s point of view, handling a peak of sales is much more challenging than ensuring uniform distribution throughout the year.

To understand which products will present planning and distribution challenges, we will compute the coefficient of variation of each reference’s yearly sales distribution.

CV = f(%TO) for HOBBIES SKU - (Image by Author)
CV = f(%TO) for HOBBIES SKU – (Image by Author)
Class A
Fortunately, most of the A SKU have a quite stable demand; we won't be challenged by the most important SKUs.
Class A reference with low CV - (Image by Author)
Class A reference with low CV – (Image by Author)
Class B
The majority of SKUs are in the stable area; however we still spend effort on ensuring optimal planning for the few references that have a high CV.
Class B reference with high CV - (Image by Author)
Class B reference with high CV – (Image by Author)
Class C
Most of the SKUs have a high value of CV;
For this kind of reference a cause analysis would provide better results than a statistical approach for forecasting.
Class C reference with very high CV - (Image by Author)
Class C reference with very high CV – (Image by Author)

Code

Can we assume that the sales follow a normal distribution?

Use the Normality Test to check if a distribution is normal

Most of the simple inventory management methods are based on the assumption that the demand follows a normal distribution.

Why?Because it’s easy.

Sanity CheckVerifying that this hypothesis cannot be refuted before implementing rules and performing forecasts is better.

We’ll use the Shapiro-Wilk test for normality, which can be implemented using the Scipy library.

The null hypothesis will be (H0: the demand sales follow a normal distribution).

Red (p-value < alpha) - (Image by Author)
Red (p-value < alpha) – (Image by Author)
Bad News
For an alpha = 0.05, we can reject the null hypothesis for most of the SKUs. This will impact the complexity of inventory management assumptions.

Code

Do you want to try this algorithm without coding?

I have deployed this solution on a web application that helps you to perform automated Pareto Analysis and ABC Classification after uploading your sales data.

Pareto Analysis ABC Classification Application: Link
Pareto Analysis ABC Classification Application: Link

Try it,

Try the App [Link]
Try the App [Link]

You can find the complete code in this GitHub repository, 👇

GitHub – samirsaci/product-segmentation: Product Segmentation for Retail with Python


Conclusion

These analyses give us insights into operational decision-making.

Which references should we prioritize?

As a data scientist, you can help store managers and warehouse and transportation teams streamline their operations by focusing on revenue-generating products.

How can we automate the process?

Automate the process with Business Intelligence.

The whole process can be automated using business intelligence solutions that extract, process and load data for live reporting.

Business Intelligence Explained in 5 Steps - (Image by Author)
Business Intelligence Explained in 5 Steps – (Image by Author)

The idea is to track the evolution of your items and adapt your

  • Demand planning strategies
  • Warehouse layouts
  • Inventory management rules

The challenges are data quality and systems integrations.

For more details about business intelligence solutions,

What is Business Intelligence?

Now that we have gained valuable insights into product segmentation, the next logical step is to leverage these insights for operational improvements.

But what can we do with these findings?

Inventory Management Rules

A critical application is optimizing inventory management.

For most retailers, inventory management systems take a fixed, rule-based approach to forecast and replenishment order management.

How can we use ABC Analysis to manage inventory?

In another article, we explore how to use Python to design inventory management rules based on demand variability.

Example of a management rule - (Image by Author)
Example of a management rule – (Image by Author)

Combining the descriptive analysis from product segmentation with advanced inventory strategies.

Objectives: minimize the inventory and avoid stock-outs.

You can ensure that your stock levels are aligned with customer demand, reducing waste and improving profitability.

For more details,

Inventory Management for Retail – Periodic Review Policy


About Me

Let’s connect on Linkedin and Twitter. I am a Supply Chain Engineer who uses data analytics to improve logistics operations and reduce costs.

For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.

If you are interested in Data Analytics and Supply Chain, look at my website.

Samir Saci | Data Science & Productivity

💌 New articles straight in your inbox for free: Newsletter 📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet


Related Articles