Explaining the “Why” behind consumer spending on Black Friday

In 2020, life as we know it changed overnight, and humanity as a collective is still figuring out what our new normal will be. Due to the…

Photo by Ashkan Forouzani on Unsplash
In 2020, life as we know it changed overnight, and humanity as a collective is still figuring out what our new normal will be. Due to the Pandemic, everyone in all walks of life got more familiar with the concept of data and insights. For most of the year, all of us were literally waking up to dashboards, whether they are about the spread of the virus or other economic derivatives of that. As that behavior permeated itself across the general population, it is also influencing the enterprise.

If you’re a data science and analytics leader, the time is now – your entire organization is hungry for more insights around cash positions, supply chain, consumer behavior, and user sentiment.

Getting ahead with Explainable AI

As the world is changing every day [5], in order to predict the immediate future, data science and analytics leaders need the ability to capture and monitor newer data and continuously improve the models. What we need now is a way to combine "Humans + Data + AI" for effective decision making.

Explainable AI is the next-generation AI that is revolutionizing business decision-making across industries [1, 4, 7, 9].

Explainable AI vs. Black-Box AI (image source: Fiddler.AI)
With Explainable AI, business and analytics leaders can make accurate decisions and know the "why" behind the model decisions and how the various factors influence model outputs [1]. The technology makes the overall decision process better informed and results in more accurate outcomes.

Black Friday Consumer Spend Analytics

Let us look at how a retail company, say "ABC LLC," can use this new technology to understand consumer purchase behavior at a level that could not have been possible in the past [3, 6]. Assuming that I am a data scientist at "ABC LLC" my goal is to create Actionable Insights into the purchasing decisions of consumers on Black Friday given historical behavior and answer the following questions for my business teams:

  • What are the top drivers behind consumer purchasing decisions?
  • What are the factors that drive a particular female consumer segment to buy?
  • Which cities are likely to spend the most?
  • How many dollars is a consumer going to spend for this year’s Black Friday, and why?
  • Are women more likely to drive more sales $ than men at this store?

To do that, there are 5 main steps that I need to take:

  1. Gather data and prepare it for training
  2. Perform exploratory analysis and build features
  3. Build a series of models that I can tune
  4. Analyze model predictions with explanations
  5. Operationalize the model for continuous insights

In fact, it is a cyclical process where we start with a dataset, prepare it for training, evaluate the model, compare it with other models we may have built, and perform challenger/champion testing and analyze the data with the model to gather insights, which can be fed back into the training process to train a better model.

Explainable AI Workflow (Image source: Fiddler.AI)
Data Gathering

For this blog, I will use a popular Kaggle Black Friday dataset [11] which is a fairly good representative of a dataset collected from a retail store’s purchase transactions. The dataset contains the following variables about a transaction:

Feature descriptions of Black Friday dataset (Image Source: here)
Exploratory Data Analysis

We are ready to perform some basic data exploration and come up with some insight. To do this, I imported the dataset into Fiddler as a flat-file and got the high-level data statistics.

Fiddler automatically generates data statistics such as feature distributions, feature correlations, and mutual information for us to get a high-level understanding of the data. For example, the following 3 insights can be gathered quickly:

  1. The majority of the transactions are coming from B-category cities
  2. Males are buying more than females
  3. 26–35 age group is the dominant purchasing group.
Purchase distribution in the data (image source: Fiddler.AI)
Gender distribution in the data (image source: Fiddler.AI)
City Category Distribution (image source: Fiddler.AI)
Age Distribution (image source: Fiddler.AI)
Mutual Information between the features and the target (image source: Fiddler.AI)
Model Building

To leverage Explainable Ai, Fiddler offers 2 options.

  1. Import in a custom pre-trained model
  2. Build an interpretable model

In this case, I used option #1 to train an XGBoost model and used Fiddler’s Python library to upload it into the Fiddler platform. The model was a regression model and had a pretty good R2 of 0.68 on training and 0.69 on the test set. It was comparable to some of the other models [3, 6] trained on the same dataset on Kaggle.

Fiddler also allows me to validate the model by looking into the actual vs. prediction scatter plots and error distribution, showing that the model is doing a pretty good job capturing the underlying dataset.

Comparing predictions with actuals (image source: Fiddler.AI)
Error distribution (image source: Fiddler.AI)
Given that this model is satisfactory, we can leverage the XAI capabilities to analyze the data and answer our questions.

Model Explainability

Let us start exploring the model along with the dataset to answer our questions.

Q1: What are the top drivers behind purchasing decisions?

Explainable AI offers a way to answer this question by analyzing the data through the model’s eyes and extracting the top drivers that influence purchasing decisions.

Fiddler’s Slice & ExplainTM [10] toolkit allows us to find out very quickly that the top driver by far is Product Category 1 and it is 64% important for consumers that are likely to buy during Black Friday at this store. It is followed by Product Category 2, Occupation Type, Age of the Consumer, etc.

Top Features for Purchase Prediction (image source: Fiddler.AI)
Q2: What are the factors that drive a particular female consumer segment to buy?

Let us say my business team is interested in targeting a marketing campaign for young females who are married and are in the age group of 18–25 and are from city category A as a consumer segment. They are interested in what drives their propensity to purchase. I can express this in the form of a SQL query on Fiddler and generate explainable insights.

As we can see from the chart below, this segment of consumers is only impacted by "Product Category 1" by 47% (dark blue bar) compared to the general population (light blue bar), which is influenced by it as much as 63%. A few more features viz., ‘Occupation Type,’ ‘Years Stayed in Current City’, ‘Category of City’ are much more important for this segment than the general population.

Top Features for this segment (image source: Fiddler.AI)
Q3: Which cities are likely to spend the most?

Again I use Fiddler’s S&E toolkit that gives me the ability to use SQL to slice and dice the data and explain it with my model. Fiddler allows me to express the query in SQL on a pair of datasets and models. It scores the model against the dataset in real-time and gets back predictions and explanation attributions for visualization.

As shown in the query dialog below, the city category that is likely to get the most $ sales will be "City Category B," which would be $2082 million.

Model Analysis (image source: Fiddler.AI)
Q4: How many dollars is a consumer going to spend and why?

Let us say our marketing team wants to do micro-targeting and personalize ads to a given user to drive her/his Black Friday purchases.

Using a similar group-by query like earlier, we get that user "1000869" is likely to buy items worth $8290.53 on average during Black Fridays. And here are the drivers that influence his purchasing decisions.

The presence of Product Category 1 and Product Category 2 drive the purchasing decisions of this user by a whopping 70%.

Top features for the consumer 1000869 (image source: Fiddler.AI)
Q5: Are women more likely to drive more sales $ than men at this store?

We can run a query against all the model predictions and find out, as shown in the figure, that it is not true. We see that the average predicted $ transaction size for men is $9498 is higher than that of women which is $8811.

Women vs. Men average predicted $ spend (image source: Fiddler.AI)
Operationalizing XAI with Monitoring

After we’re satisfied with the analysis, we can operationalize this model by connecting it to a Data Warehouse, continuously processing new transactional data, and providing explanations and predictions for business users. Once the model is live, we can monitor the performance in production and close the feedback loop. That way, we can track business KPIs, performance metrics, and set up alerts when something goes out of the ordinary. Fiddler’s Explainable Monitoring [8] features help data scientists and analysts to keep track of the following:

  • Feature Attributions: Outputs of explainability algorithms allow further investigation, helping to understand which features are the most important causal drivers for model predictions within a given time frame.
  • Data Drift: Track the data coming from the Datawarehouse so that analysts and data scientists can get visibility into any training-serving skew.
  • Outliers: The prediction time series from the model outputs and outliers that are automatically detected for egregious high or low-purchase predictions
Operational Dashboard with Explainable Insights (Image Source: Fiddler.AI)
Fiddler works with a wide range of data warehouse and BI tools so that analysts and data scientists can create operational dashboards, reports, and the answers to interactive queries using cutting-edge Explainable AI.

Fiddler Product (image source: Fiddler.AI)
In this blog, I described how an Explainable AI Platform can help data scientists and analysts uncover deeper predictive insights from business datasets. I walked through a simple case study with a publicly available Kaggle BlackFriday dataset [11]. If you’re interested in learning more about the platform or try it out – please feel free to email [email protected] or fill this form.


