What is Process Mining?

Samir Saci
Towards Data Science
10 min readJan 4, 2023

--

Learn how to use Python for Process Mining and unlock the power of your business data with this comprehensive guide.

Process mining is a type of data analytics that focuses on the discovery, monitoring, and improvement of business processes.

It involves analyzing data from various sources, such as process logs, to understand how a process is being executed, identify bottlenecks and inefficiencies, and suggest ways to improve it.

Example of Supply Chain Information Systems for Process Mining — https://samirsaci.com
Example of Supply Chain Information Systems for Process Mining — (Image by Author)

In previous articles, I shared examples of Python tools designed to monitor and visualize logistics performance, build an automated supply chain control tower or create smart visualizations.

In this article, we will go beyond monitoring and see how process mining with Python can help you identify bottlenecks and improve productivity.

💌 New articles straight in your inbox for free: Newsletter
📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

I. What is Process Mining?
1. Understanding Process Mining
2. How Process Mining Can Benefit Your Business Operations
II. How to Implement Process Mining with Python
1. Real-World Examples of Process Mining with Python
2. Approach 1: Discovery
3. Approach 2: Conformance
4. Approach 3: Enhancement
5. Best Practices for Process Mining with Python
III. Conclusion
1. It requires harmonized and clean data

What is Process Mining?

Understanding Process Mining

Process mining techniques can be applied to many processes, including manufacturing, supply chain management, healthcare, and customer service.

By analyzing data about the process, process mining can help organizations understand how their processes function, identify areas for improvement and make informed decisions about optimising their processes.

Example of Process Mining for Customer Service — https://samirsaci.com
Example of Process Mining for Customer Service — (Image by Author)

For example, in this article, we explore a statistical method to estimate the lead time of order processing by a customer service team.

How Process Mining Can Benefit Your Business Operations

There are several different approaches to process mining, including discovery, conformance, and enhancement.

  • Discovery involves using process mining techniques to uncover the structure and behaviour of a process based on data from process logs.
  • Conformance involves comparing the actual process to the desired process model to identify deviations and deviations.
  • Enhancement involves using process mining to suggest improvements to the process based on data analysis.

💡 Follow me on Medium for more articles related to 🏭 Supply Chain Analytics, 🌳 Sustainability and 🕜 Productivity.

How to Implement Process Mining with Python

Real-World Examples of Process Mining with Python

Let us take the example of the Supply Chain Network of an international clothing group with stores worldwide.

This company sells garments, bags and accessories produced in overseas factories to replenish a network of regional warehouses.

Supply Chain Network of a Fashion Retail Company — https://samirsaci.com
Supply Chain Network of a Fashion Retail Company — (Image by Author)

This article will focus on the store's delivery from these regional warehouses.

Data exchange between Information Systems for Supply Chain — https://samirsaci.com
Data exchange between Information Systems for Supply Chain — (Image by Author)

Each process step is managed by a specific system (ERP, WMS, TMS) that records information and time stamps that we will use for process mining.

Approach 1: Discovery

a) Time stamp definitions
From the order creation to the store reception, time stamps are recorded by the different systems.

Time Stamps per Process — (Image by Author)
  • Order creation time in the ERP by the distribution planner
  • Order reception time in the Warehouse Management System (WMS)(now ready to be prepared by warehouse teams)
  • Picking time(s) that can include the starting and ending time of the picking mission that contains the order
  • Packing time refers to the end of the packing process
  • Shipping time when your orders are leaving the warehouse
  • Delivery time when your truck delivers the pallets to the store
  • Store receiving time is concluding the delivery process: the time when the store’s team records the received items in the ERP

💡 Know your processes
In your company, systems can be used differently. Therefore, ensure processes have been mapped with detailed workflows (including data inputs and outputs).

b) Lead times definitions

As timestamps alone mean nothing, we define lead times for each process by the difference between two timestamps

Logistics Lead Times Definition for distribution — https://samirsaci.com
Logistics Lead Times Definition for distribution— (Image by Author)

They usually refer to a specific team’s responsibility

  • End-to-end lead time is measuring the performance of the whole logistics department, also defined as On Time In Full (OTIF)
  • Order Transfert lead time is under the responsibility of the IT/Infrastructure team that ensures fast transmissions of orders
  • The warehouse operational team performance is measured by Preparation Lead time
  • Shipment lead time is a grey zone as it can be impacted by warehouse teams, the finance department (invoicing) or transportation teams (find a truck for shipment)
  • Transportation lead time is entirely under the scope of transportation teams
  • Receiving lead time is the responsibility of store teams

💡 You need the support of system(s) expert(s)
Your systems may not have a time stamp for every process. But there are always alternative solutions that your infrastructure team can find.

For instance, there is no timestamp to show when the packing process is ending.

  • But the order status is changing (from packing to ready to ship).
  • A script can be developed to create a timestamp when this status is changed.
  • It can then be populated in the data lake you use as a data source.

c) Data processing with Python

Using Python, you can connect to different systems to extract transactional records from their respective databases.

Process Data Processing with Python — https://samirsaci.com
Process Data Processing with Python — (Image by Author)

These systems may have different fields and metrics definitions. Use the order number to merge the tables in a single data frame and calculate the lead times.

Example of lead times plot
Example of lead times plot — (Image by Author)

The graph above shows an example of lead times plots (in minutes) that provide an overview of performance variability of order transmission, pick and pack and warehouse-airport transfer.

For more information on how to build a logistic performance dashboard, have a look at this short video,

Approach 2: Conformance

a) Delivery Lead Time: On Time In Full

The most important key performance indicator is the lead time between order creation and delivery times.

Learn more about OTIF in this video
Learn more about OTIF in this video — (Image by Author)

Distribution planners use it to manage their inventory, set the safety stock and plan new collection launches.

It will also be used by store managers to challenge the logistics department in the different supply performance reviews.

b) End-to-end delivery process

In our example, we have a delivery lead time target of 72 hours.

Example of a Delivery Order Process — (Image by Author)

The example above shows the delivery that meets the target lead times.

💡 Cut-off time definition
The order reception cut-off time is 18:00:00.

If an order is received after this time, you must wait 24 hours to prepare it.

They are the main cause of delayed delivery and disruptions.

c) Visualization

After defining them, you can use your Python script to compare lead times with these targets.

Lead Times Target Plot with Python Matplotlib — (Image by Author)

In this visual, you can see that

  • For some orders, warehouse teams are not respecting the preparation lead time target of 5.5 hours (630 minutes).
  • Order transmission or warehouse airport transfer will never be a problem as you are way below the maximum lead time.

Approach 3: Enhancement

a) Visualize delays

The objective is to ensure that you have all our orders transmitted, prepared and delivered in less than 72 hours.

It starts with visualizations that will help you to answer simple questions.

  • How many orders have been delivered late?
  • Which process is impacting your overall lead time the most?
Delays Root Causes Donut Plot
Delays Root Causes Donut Plot — (Image by Author)
Visual for Root Cause Analysis
Visual for Root Cause Analysis — (Image by Author)

The visual above will help you spot the late deliveries and quickly screen the different legs to understand which one impacted your lead time

  1. Start to look at the bottom graph
  2. You spot that one order lead time is above the yellow line
  3. Screen the lead times above to find which one(s) are causing this delay

💡 Reason Code Mapping
As part of prescriptive analytics, you can start to create automatically
reason codes for late deliveries that will be used for visuals like the donut plot above.

(Example: if your shipment is delayed and your loading lead time is higher than the target, you can add a late delivery reason code: loading)

b) Process mining to reduce delays

Now that you know the root causes, you can focus on designing solutions to avoid delays in the future.

For instance,

  1. You have spotted that many orders are delivered late due to long shipping lead times.
  2. After further analysis, you found that the root cause is the invoicing process.
  3. This process depends on the delivery location (region, store type (duty-free, franchise, ..), product category, order type (cross-docking, replenishment, collection launches), and order creation date.
Model for Prediction of Late Invoicing
Model for Prediction of Late Invoicing— (Image by Author)

You use data to train a model that predicts the probability of having delays in the invoicing process using the features listed above

  1. You have spotted a high correlation between the boolean output (on-time invoicing) and the order creation day (Monday, .. Sunday), as well as the store region (Europe, Middle East, Asia, America).
  2. The vast majority of late invoices are from orders created in the second half of the week for stores located in the Middle East.
  3. After aligning with local logistics teams in the Middle East, you discovered that the invoice process is stopped on Thursdays and Fridays as it is the weekend in this region.

You can then ask demand planners (located in Europe) to adapt their order creation processes to these local specificities.

Best Practices for Process Mining with Python

💡 Bring insights to operational and IT teams
The example above shows that data alone cannot design a solution or solve problems alone.

They need to be included during

  • Time stamps are defined to ensure that the information you retrieve from systems matches the operational reality.
  • Lead time definitions to match them with your company’s supply chain performance management.
    (Example: if store receiving is not included in the transportation team’s performance measure, you need to remove it)
  • Data processing with Python to ensure that the fields you are using for your calculation are the right ones.
  • Visualization creation to ensure that operational teams can use them to get insights.

The models and visualizations you designed will provide insights to support continuous improvement initiatives and boost strategic decision-making.

Conclusion

Overall, process mining can be a powerful tool for improving the efficiency and effectiveness of business processes and is increasingly used by organizations to drive process improvement efforts.

It requires harmonized and clean data.

This basic requirement can be a major obstacle in your digital transformation journey.

The worst-case scenario, which is unfortunately very common, would be an organisation with

  • Several systems not communicating together like multiple ERP instances, one WMS per warehouse fully controlled by the 3PL, and several TMS not interfaced with the ERP
  • Many customizations that are not documented such as additional fields created by a developer who left the company
  • The absence of a single source like a data lake that groups the databases of your different systems

At this stage, you need to work on harmonizing and building a strong data architecture (with governance to maintain data integrity) before jumping into process mining.

Next Steps

After spotting operational failures and finding the root causes, you will be interested in building a Digital Twin to simulate several scenarios and test the resilience of your solutions.

About Me

Let’s connect on Linkedin and Twitter, I am a Supply Chain Engineer using data analytics to improve logistics operations and reduce costs.

If you are interested in Data Analytics and Supply Chain, have a look at my website.

References

--

--

Top Supply Chain Analytics Writer — Follow my journey using Data Science for Supply Chain Sustainability 🌳 and Productivity ⌛