
For most retailers, Demand Planning systems take a fixed, rule-based approach to forecast and replenishment order management.

This works well enough for stable and predictable product categories but can show its limits with unstable demand impacted by external factors.
As a data scientist, how can you improve robustness? Provide better forecasts with Machine Learning.
In this article, we will implement a model to forecast the demand for retail stores using machine learning with Python.
This approach uses the M5 Competition Walmart dataset that will be introduced in the first section.
Summary
I. Demand Planning Optimization Problem Statement
Forecast the demand of 50 retail stores in US
II. XGBoost for Sales Forecasting
Build a forecasting model using Machine Learning
III. Demand Planning: XGBoost vs. Rolling Mean
1. Demand Planning using Rolling Mean
An initial approach using a simple formula to set the baseline
2. XGBoost vs. Rolling Mean
What is the impact of Machine Learning on Accuracy?
3. Product Segmentation for Retail using Python
Do you need to apply machine learning on all items?
IV. Next steps
1. Implement Inventory Management Rules
Combine your forecasting model with Inventory Rules to reduce stockouts
2. Simulation Model with ChatGPT - "The Supply Chain Analyst"
Implement analytics products with UI powered by GPT on ChatGPT
3. Sustainable Approach: Green Inventory Management
Reduce the carbon footprint of your supply chain with smart inventory rules
4. Improve the Machine Learning Model
Features Engineering can help us grab additional points of accuracy
Demand Planning Optimization Problem Statement
Retail Company with 50 Stores
For this study, we’ll take a dataset from the Kaggle challenge: Store Item Demand Forecasting Challenge.
Scope
- Transactions from 2013–01–01 to 2017–12–31
- 913,000 Sales Transactions
- 10 Stores
- 1,913 days for the training set and 28 days for the evaluation set
What do they sell?
Exploratory Data Analysis
We want to predict sales of 3,049 unique in these ten stores.

They are groups in three different families with sub-categories.
Which external factors do we have on hand?
As you can guess, we can’t have an exhaustive list of the external factors influencing these sales (no one has it).
However, the dataset includes
- Pricing per reference per store for each period
- Store Location
- Transaction Date
With these data on hand, let’s see how we can forecast the demand using Machine Learning with Python.
XGBoost for Sales Forecasting
The initial dataset was used for a Kaggle Challenge, where teams competed to design the best model to predict sales.
The first objective here is to design a prediction model using XGBoost.
This model will optimize our replenishment strategy, ensuring inventory optimization and reducing the number of deliveries from your Warehouse.
Can we enrich this dataset? Yes!
Add Date Features
Using the date, we can use the year, month, day of the week, and additional date variations to capture patterns in the demand variability.
What do we mean by trend?
Daily, Monthly Average for Train
Daily or monthly average sales may impact your future demand.
Therefore, it becomes an interesting additional parameter to add.
What about rolling averages?
Add Daily and Monthly Averages to Test and Rolling Averages
They are the benchmark of your model and can also improve the accuracy of your machine-learning model.
You should always ask yourself.
Is my model better than a moving average?
If not, you have no reason to continue to invest time in building a model that will require resources to be put into production.
Now that we have enriched our models with additional features, let’s verify any correlation between these features and the metric we want to forecast.
Heat Map to check correlation

Let us keep the monthly average since it is the most correlated with sales and remove other highly correlated features.
There is no point in keeping features that are correlated to each other.
Clean features, Training/Test Split and Run model
Now that we trained the model, let’s have a look at the results.
Results Prediction Model

Based on this prediction model, we’ll build a simulation model to improve demand planning for store replenishment.
When should we trigger a replenishment?
Replenishment is delivering additional goods to a store to ensure we have the minimum inventory needed to meet customers’ demands.
As an output, we have a dataset with the following features
- date: Transaction date
- item: SKU Number
- store: Store Number
- sales: Actual value of sales transaction
- sales_prd: XGBoost prediction
- error_forecast: sales_prd – sales
- reply: boolean value for replenishment days (if the day is in [‘Monday’, Wednesday’, ‘Friday’, ‘Sunday’] return True)
Does our model provide added value vs. a benchmark?
This is what we’ll try to figure out in the next section.
Demand Forecasting: XGBoost vs. Rolling Mean
Demand Forecasting using Rolling Mean
Your benchmark method to forecast demand is the rolling mean of previous sales.
Easy to design, deploy and maintain.
At the end of Day n-1, you need to forecast demand for Day n, Day n+1, Day n+2.
- Calculate the average sales quantity of the last p days: Rolling Mean (Day n-1, …, Day n-p)
- Apply this mean to the sales forecast of Day n, Day n+1, Day n+2
- Forecast Demand = Forecast_Day_n + ForecastDay(n+1) + ForecastDay(n+2)

Is our model more accurate than the benchmark?
XGBoost vs. Rolling Mean
With our XGBoost model, we now have two methods for demand forecasting.
Let us try to compare the results of these two methods on forecast accuracy:

-
Prepare Replenishment on Day n-1 We need to forecast replenishment quantity for Day n, Day n +1, Day n+2
-
XGB prediction gives us a demand forecast Demand_XGB = Forecast_Day(n) + Forecast_Day(n+1) + Forecast_Day(n+2)
-
The Rolling Mean Method gives us a demand forecast Demand_RM = 3 x Rolling_Mean(Day(n-1), Day(n-2), .. Day(n-p))
-
Actual Demand Demand_Actual = Actual_Day(n) + Actual_Day(n+1) + Actual_Day(n+2)
-
Forecast Error Error_RM = (Demand_RM – Demand_Actual) Error_XGB = (Demand_XGB— Demand_Actual)
What is the optimal number of days for the rolling average?
Parameter tuning: Rolling Mean for p days
Before comparing Rolling Mean results with XGBoost.
Let us find the best value for p to get the best performance.

Results: -35% of error in forecast for (p = 8) vs. (p = 1)
Thus, based on the sales transactions profile, we can get the best demand planning performance by forecasting the next day’s sales using the average of the last 8 days.
Let’s compare it with XGBoost.
XGBoost vs. Rolling Mean: p = 8 days

Results: -32% error in the forecast by using XGBoost vs. Rolling Mean

The results are convincing.
However, we should remember that training and maintaining a model with so many references to forecast can be challenging.
Do we have to use Machine Learning to forecast all references?
Product Segmentation for Retail
The answer is no.
With product segmentation, you can group products considering their contribution to the turnover and their demand variability.

In short, you want to focus on the products with the most turnover and unstable demand.
What about the other products?
These high-importance SKUs are the ones you’ll target for the machine learning model, while the other can be forecasts with fixed rules or simple statistical models.
To learn more about product segmentation, read my article linked below.
Conclusion
With our baseline model, the rolling mean, finding the best parameter p days could reduce forecast error by 35%.
However, we could perform even better using the XGBoost forecast to predict demand for days n, n+1, and n+2, adding 2%.
Does that means we have to switch to Machine Learning?
The answer is "not always".
- Not for all references based on your product segmentation
- Not if you don’t have external factors in your dataset
What are the next steps? Implementing inventory management rules.
Implement Inventory Management Rules
In many traditional retailers, inventory management systems take a fixed, rule-based approach to replenishment order management.
We need replenishment policies that minimize ordering, holding and shortage costs.
Combined with our forecasting model, we can help you optimize your inventory, reduce stock-outs, and avoid overstock.
When do you need to replenish your store?
With which quantity?

In the chart above, you can see
- The store demand for a specific SKU (RED)
- The replenishment quantities (BLUE
- The inventory On Hand (GREEN)
These policies are based on the Economic Order Quantity and consider the variability of your demand.

I invite you to look at this series of articles for more information.
- Start with a simple deterministic model to get familiar with the EOQ
- Move to more complex models adapted to stochastic demand distributions
Inventory Management for Retail – Stochastic Demand
Inventory Management for Retail – Periodic Review Policy
Have you heard about Generative AI?
Simulation Model with ChatGPT – "The Supply Chain Analyst"
Large Language Models like GPT can support the analysis of your inventory management rules and interact with users.
What is the optimal rule to minimize my ordering costs?
I introduce a custom GPT designed to automate Supply Chain analytics tasks in this article.

This initial prototype is a proof of concept.
OpenAI’s GPTs can be used with advanced analytics models in Python scripts to create interactive analytics products.

We can imagine users uploading their sales and interacting with the agent to understand how to set an optimal rule.
For more details,
Create GPTs to Automate Supply Chain Analytics
Leveraging LLMs with LangChain for Supply Chain Analytics – A Control Tower Powered by GPT
What about sustainability?
My approach always focuses on maximizing accuracy to optimize your store’s inventory.
The goal is to minimize the costs of ordering, delivering and storing items while meeting customer demand.
This can be coupled with sustainable practices to minimize the environmental footprint of your distribution network.
What would be the impact on CO2e emissions if we reduce the frequency of store replenishments?
Sustainable Approach: Green Inventory Management
This can be defined as managing inventory in an environmentally sustainable way.

This involves processes and rules reducing the environmental impact of order preparation and delivery.

In the example above, we see two different store replenishment approaches.
- On the left, you replenish less frequently with high quantity delivered.
- On the right, you replenish more frequently with a low delivery quantity.
Do these two approaches have the same impact on the CO2 emissions of your distribution network?
Of course, they don’t.

In this case study, we discover how to simulate the variation of store replenishment frequency and measure its impact on the overall environmental impact.
Data Science for Sustainability- Green Inventory Management
Can we improve the model with better features engineering?
Improve the model
Yes! I have been working on an improved version of the model, and I share my insights in the article below (with the complete code).
The goal is to understand the impact of adding business features (e.g., price change, sales trend, store closing) on the model’s accuracy.
Machine Learning for Retail Sales Forecasting – Features Engineering
About Me
Let’s connect on Linkedin and Twitter. I am a Supply Chain Engineer using data analytics to improve logistics operations and reduce costs.
For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.
If you are interested in Data Analytics and Supply Chain, look at my website.
💌 New articles straight in your inbox for free: Newsletter 📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet
References
[1] Kaggle Dataset, Store Item Demand Forecasting Challenge, Link