The world’s leading publication for data science, AI, and ML professionals.

Media Mix Modeling: How to measure the effectiveness of advertising with Python & LightweightMMM

Media Mix modeling, its implementation, and practical tips

Photo by Andreas M on Unsplash
Photo by Andreas M on Unsplash

TLDR:

Media Mix Modeling, also called Marketing Mix Modeling (MMM), is a technique that helps advertisers to quantify the impact of several marketing investments on sales. LightweightMMM is a Python library for MMM considering Media Saturation and Ad-stock. However, you will probably need trials and errors when you try MMM. For practical insights and actions, keep trying better data, making better models, and doing better experiments.

If you prefer learning through videos, feel free to check out this link (You Tube). And for those eager to dive straight into the sample code, please refer to my GitHub repository. I’d be thrilled if you could leave a star!

GitHub – takechanman1228/mmm_pydata_global_2022

  1. Introduction

What comes to your mind when you hear the word advertisement?

Let me give you some examples. TV commercials are a common approach. Social media ads like when you check your friends’ posts or videos on social media platforms, you probably see many ads. Also, if you google something, you usually can see some ads at the top of the result. In addition, ads on buses, airports, trains, or taxis, on buildings, known as OOH, Out-of-home advertising are fairly common.

Image by Auther, thumbnail by unsplash.com
Image by Auther, thumbnail by unsplash.com

Media Optimization has been a challenge for a long time. Some of you might be familiar with the marketing pioneer John Wanamaker. He supposedly said, "Half the money I spend on advertising is wasted; the trouble is I don’t know which half."

Image by Author, thumbnail by Wikipedia
Image by Author, thumbnail by Wikipedia

A statistical approach for solving this question is called media mix modeling or marketing mix modeling. It is generally referred to as MMM for short. The purpose of MMM is to understand how much each marketing media contributes to sales, and how much money should be spent on each.

For many decades, companies with huge advertising budgets in the beverage, consumer goods, auto, and fashion industries have been working on improving MMM. Also, ad tech companies, such as Google and Meta, have been focusing on MMM actively these days.

2. What is MMM?

MMM are statistical models which help in quantifying the impact of several marketing inputs on sales.

Roughly speaking, there are three goals.

  • The first goal is to "understand & Measure Return on Investment (ROI)". For example, the model will tell you your ROI on TV last year.
  • The Second objective is simulation. For example, with this, you can answer a business question like "What would our sales be if more or less money were spent on TV next year? You would be able to find out what your sales would be like if more or less money were spent on TV the following year.
  • The third one is optimizing media budgets. This step will help you to optimize budget allocation, which will contribute to maximizing sales.

Key challenges in media optimization

You might wonder why it is so difficult to measure ROI or why not just check ROI on the report issued by each media.

These are good questions. But the reality is a little more complicated.

The first reason is that the end-user has multiple media touchpoints, and media channel influences are intertwined.

Secondly, tracking accuracy is not always correct these days. Offline media channel influence is hard to track. For example, for print media such as newspapers or magazines, we can’t track how many people actually see the ads in that form of media. What is worse, even in the digital world, privacy regulations such as GDPR and Apple’s IDFA deprecation have been impacting tracking accuracy.

Thirdly, randomized experiments, known as a Lift test, are impractical. The gold standard for answering a causal question is to perform a randomized experiment by randomly splitting a population into a test group and a control group where the test group has no advertisements. However, this is not practical because companies prefer not to restrict ads for a long time, as this could lead to lost opportunities.

Image by Author
Image by Author

3. Data Preparation

3.1 Input data

We use time-series data and do not use any privacy-related data. As you can see, we have a week, sales, media spending, and other data column.

Image by Author
Image by Author

3.2 What kind of data is needed?

The first part is the most important metric, which is the KPI of your business, and this will be a dependent variable. If you are a retailer or a manufacturer, sales is a common choice. However, if you are a mobile app company, the number of apps installed would be the KPI. Next, explanatory variables are potential factors that impact sales. Media data is mandatory because we want to optimize these allocations. And Non-media marketing data such as price, promotion, or product distribution affects sales. External factors such as seasonality, holidays, weather, or macroeconomy data are also important to increase the model’s accuracy.

Image by Author
Image by Author

3.3 How granular should the data be?

In terms of time, MMMs often require two to three years of weekly-level data. However, if you don’t have that much data, daily data is also acceptable, but in that case, you will need to be more careful in reviewing the outliers. Next is business granularity. The common approach is to collect brand or business unit-level data. For example, Procter & Gamble has Pantene, Head and Shoulders, and Herbal Essence in the hair care category. And each brand team has a different sales, marketing, and media strategy. Make sure to determine the data granularity based on the product line, organization, and decision-making process. When looking at media spending data, a common granularity is the media channel level, such as TV, Print, OOH, and digital. But it depends on how much you are spending on each media. For example, if you spend a lot on digital ads, it is better to break down the digital channel into more specific groups, such as Google search ads, Google display ads, YouTube ads, Facebook ads, etc., because Google search ads and YouTube ads have different funnels and roles.

4. Modeling

4.1 Simple traditional approach – Linear Regression

First, let’s start by considering simple modeling. Linear regression on observational data is a common method that has traditionally been used.

Image by Author
Image by Author

Here, sales is the objective variable, and media spending factors and control factors are explanatory variables. These coefficients mean the impact on sales. So, beta_m is the coefficient of the media variables, and beta_c is the coefficient of the control variables such as seasonality or price change. The most significant advantage of this method is that everyone can run it quickly because even Excel has a regression function. Also, it’s easy for everyone, including non-tech executives, to understand the results intuitively. However, this method is not grounded in key marketing principles that are widely accepted by the marketing industry.

4.2 Two principles in advertising

There are two ad principles to consider: Saturation and Ad stock.

Image by Author
Image by Author

Saturation: The effectiveness of one media channel’s advertisements decays as the expenditure increases. Let me say that in a different way: The more money you spend on one media channel advertisement, the less effective it is. Saturation is also called the shape effect.

Ad-Stock : The advertising effect on sales may lag behind the initial exposure and extend several weeks because consumers generally remember ads for an extended period of time, but they sometimes delay action. There are several reasons why: Consumers don’t purchase the items immediately if they already have home stock. Or If they plan to purchase expensive items such as a PC, furniture, or a TV, they may take several days to several weeks to consider purchasing the items. These examples are what cause the carry-over effect.

4.3 Model proposed by Google Researchers Jin et al.

Researchers at Google proposed a method that reflects these two features in 2017. The formula below is the final model that reflects the carryover effect and ad saturation.

Image by Author
Image by Author

The basic approach is the same as the simple model I shared earlier. Sales can be decomposed into baseline sales, media factors, control factors, and white noise. And in this formula, the coefficient beta represents the impact of each factor. The change here is to apply two transformation functions to the time series of media spending: saturation and ad stock function.

4.4 Useful MMM libraries (LightweightMMM vs Robyn)

Here, let me introduce two great OSS libraries that will help you to try MMM : LightweightMMM, a Python-based library developed mainly by Google developers, and Robyn, an R-based library developed by Meta.

LightweitMMM uses Numpyro and JAX for Probabilistic Programming, which makes the modeling process much faster. On top of the standard approach, LightweightMMM offers a hierarchical approach. If you have state-level or regional-level data, this geo-based hierarchical approach can yield more accurate results.

While Robyn makes use of Meta’s AI library ecosystem. Nevergrad is used for hyperparameter optimization, And Prophet is adopted for handling time series data.

5. Sample Code

Let me show you how it actually works with LightweightMMM. Full code can be found on my Github below. My sample code is based on lightweight_mmm’s official demo script.

mmm_pydata_global_2022/simple_end_to_end_demo_pydataglobal.ipynb at main ·…

First, let’s install the lightweight_mmm library using pip command. It should take about 1–2 minutes. If you get the error "restart runtime", you need to click the "restart runtime" button.

!pip install --upgrade git+https://github.com/google/lightweight_mmm.git

Also, let’s import some libraries such as JAX, numpryro, and necessary modules of the library.

# Import jax.numpy and any other library we might need.
import jax.numpy as jnp
import numpyro

# Import the relevant modules of the library
from lightweight_mmm import lightweight_mmm
from lightweight_mmm import optimize_media
from lightweight_mmm import plot
from lightweight_mmm import preprocessing
from lightweight_mmm import utils

Next, let’s prepare the data. The official sample script uses a simulated data set that is generated by the library’s function to create dummy data. However, I’m going touse more realistic data in this session. I found a good dataset on a GitHub repository: sibylhe/mmm_stan. I am not sure whether this data set is real, dummy, or simulated data, but for me, it looks more realistic than any other data I found on the internet.

import pandas as pd

# I am not sure whether this data set is real, dummy, or simulated data, but for me, it looks more realistic than any other data I found on the internet.
df = pd.read_csv("https://raw.githubusercontent.com/sibylhe/mmm_stan/main/data.csv")

# 1. media variables
# media spending (Simplified media channel for demo)
mdsp_cols=[col for col in df.columns if 'mdsp_' in col and col !='mdsp_viddig' and col != 'mdsp_auddig' and col != 'mdsp_sem']

# 2. control variables
# holiday variables
hldy_cols = [col for col in df.columns if 'hldy_' in col]
# seasonality variables
seas_cols = [col for col in df.columns if 'seas_' in col]

control_vars =  hldy_cols + seas_cols

# 3. sales variables
sales_cols =['sales']

df_main = df[['wk_strt_dt']+sales_cols+mdsp_cols+control_vars]
df_main = df_main.rename(columns={'mdsp_dm': 'Direct Mail', 'mdsp_inst': 'Insert', 'mdsp_nsp': 'Newspaper', 'mdsp_audtr': 'Radio', 'mdsp_vidtr': 'TV', 'mdsp_so': 'Social Media', 'mdsp_on': 'Online Display'})
mdsp_cols = ["Direct Mail","Insert", "Newspaper", "Radio", "TV", "Social Media", "Online Display"]

Let’s take a quick look at it. This data contains four years of records of data at a weekly level. For simplicity, I use seven media channels for media spending data, and holiday and seasonal information for control variables.

df_main.head()

Next, I’m going to preprocess the data. We split the dataset into train and test. I’m leaving only the last 24 weeks for testing in this case.

SEED = 105
data_size = len(df_main)

n_media_channels = len(mdsp_cols)
n_extra_features = len(control_vars)
media_data = df_main[mdsp_cols].to_numpy()
extra_features = df_main[control_vars].to_numpy()
target = df_main['sales'].to_numpy()
costs = df_main[mdsp_cols].sum().to_numpy()

# Split and scale data.
test_data_period_size = 24
split_point = data_size - test_data_period_size
# Media data
media_data_train = media_data[:split_point, ...]
media_data_test = media_data[split_point:, ...]
# Extra features
extra_features_train = extra_features[:split_point, ...]
extra_features_test = extra_features[split_point:, ...]
# Target
target_train = target[:split_point]

Also, this library provides a CustomScaler function for preprocessing. In this sample code, we divide the media spending data, extra features data, and the target data by their mean to ensure that the result has a mean of 1. This allows the model to be agnostic to the scale of the inputs.

media_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
extra_features_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
target_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
cost_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean, multiply_by=0.15)

media_data_train = media_scaler.fit_transform(media_data_train)
extra_features_train = extra_features_scaler.fit_transform(extra_features_train)
target_train = target_scaler.fit_transform(target_train)
costs = cost_scaler.fit_transform(costs)p

The next step is training. We can choose an ad stock function for the modeling from 3 options: Hill-ad stock, Ad stock, and carryover. It is generally recommended to compare all three approaches, and use the approach that works the best.

mmm = lightweight_mmm.LightweightMMM(model_name="hill_adstock")
mmm.fit( media=media_data_train, media_prior=costs, target=target_train, extra_features=extra_features_train, number_warmup=number_warmup, number_samples=number_samples, media_names = mdsp_cols, seed=SEED)

Once training is finished, you can check the summary of your trace: The important point here is to check whether r hat values for all parameters are less than 1.1. This is a checkpoint when you run Bayesian modeling.

mmm.print_summary()
Image by Author
Image by Author

We can visualize the posterior distributions of the media effects.

plot.plot_media_channel_posteriors(media_mix_model=mmm, channel_names=mdsp_cols)

Now, let’s do a fitting check. The model’s fit to the training data can also be checked by using plot_model_fit function. R-squared and MAPE, mean absolute percentage error, are shown in the chart. In this example, R2 is 0.9, and MAPE is 23%. Generally speaking, R2 is considered good if it is more than 0.8. Also, for MAPE, the goal is for it to be 20% or below.

plot.plot_model_fit(mmm, target_scaler=target_scaler)
Image by Author
Image by Author

And this is the visualization of the prediction result. R2 is 0.62, and MAPE is 23%. Honestly, the R2 and MAPE values here are not ideal. However, I do not have any additional data, and – I’m not even sure – whether this data set is real or a dummy. That said, I’m still going to be using – this data set and modeling – to show you the insights. I’ll be going over how to improve the model in more detail later.

plot.plot_out_of_sample_model_fit(out_of_sample_predictions=new_predictions,
                                 out_of_sample_target=target_scaler.transform(target[split_point:]))
Image by Author
Image by Author

Outcomes

We can quickly visualize the estimated media & baseline contribution over time by using this function. The graph below shows that about 70% of sales are baseline sales, which is represented by the blue area. The other colors show media contribution to the remaining sales.

media_contribution, roi_hat = mmm.get_posterior_metrics(target_scaler=target_scaler, cost_scaler=cost_scaler)
plot.plot_media_baseline_contribution_area_plot(media_mix_model=mmm,
                                                target_scaler=target_scaler,
                                                fig_size=(30,10),
                                                channel_names = mdsp_cols
                                                )
Image by Author
Image by Author
plot.plot_bars_media_metrics(metric=roi_hat, metric_name="ROI hat", channel_names=mdsp_cols)

This graph shows the estimated ROI of each media channel. Each bar represents how efficient the ROI of the media is. In this case, TV and Online Display are more efficient than other media.

Image by Author
Image by Author

We can visualize the optimized media budget allocation. The graph shows the previous budget allocation and optimized budget allocation. In this case, direct mail and radio should be reduced, and other media should be increased.

plot.plot_pre_post_budget_allocation_comparison(media_mix_model=mmm, 
                                                kpi_with_optim=solution['fun'], 
                                                kpi_without_optim=kpi_without_optim,
                                                optimal_buget_allocation=optimal_buget_allocation, 
                                                previous_budget_allocation=previous_budget_allocation, 
                                                figure_size=(10,10),
                                                channel_names = mdsp_cols)
Image by Author
Image by Author

6. How can you improve the model’s accuracy?

A tailor-made model is needed for better insights and actions because there is no "One size fits all" model, as every business is in a different situation.

Then, how do we improve model accuracy for better insights and actions?

Better Data : You need to choose the control variables that affect your sales based on your business. Generally speaking, sales fluctuate according to Promotions, Price Changes, and discounts. Out-of-stock information also has a significant impact on sales. Google researchers identified that search volume for relevant queries can be used in MMM to control the impact of paid search ads appropriately.

If you spend a lot on a specific media channel, it is better to break down the media channel into more specific groups.

Better model: The next recommendation is to improve the modeling. Of course, hyperparameter tuning is important. In addition to that, trying the Geo-level hierarchical approach is a good way to get better accuracy.

Better experiment: The third recommendation is to work with your marketing team and do actual experiments, known as a Lift test. As previously mentioned, it is unrealistic to do randomized experiments in all media. However, experimentation at key points is useful to get the ground truth and improve the model. Meta recently released the Geo Lift, which is an OSS solution that can be useful for geo-based experimentation.

Image by Author
Image by Author

7. Conclusion

Let’s summarize some key takeaways.

  • MMM are statistical models which help in quantifying the impact of several marketing inputs on sales.
  • In advertising, saturation and Ad Stock are the key principles. They can be modeled using the transformation function.
  • If you are familiar with Python, LightweightMMM is a good first step.
  • For better insights and actions, keep trying better data, making better models, and doing better experiments.

Thank you for reading! If you have any questions/suggestions, feel free to contact me on Linkedin! Also, I would be happy if you follow me on Towards Data Science.

8. Reference


Related Articles