An overview of a new powerful tool for Marketing Mix Modeling

This article provides you a first overview of Facebook Experimental’s Robyn. Since the Facebook Marketing Science team has already created a great quick start guide and very detailed pages, I try to keep the article short and on point. For detailed explanations, you can find more information here.
tl;dr
- Facebook Experimental’s Robyn is an automated Marketing Mix Modeling (MMM) code which is currently in beta version.
- It offers two adstock (geometric and weibull) and an s-curve transformation (diminishing returns) techniques for feature transformation.
- To take time series features into account Robyn makes use of Facebook Prophet.
- It generates a set of Pareto optimal model solutions by making use of Facebook’s Nevergrad gradient-free optimization platform.
- To increase the model’s accuracy it allows you to include results from randomized controlled-experiments.
Before we start… Why is MMM so important?
Two big questions every marketeer has are What’s the impact of my current marketing channels? and How should I allocate my budget strategically to get the optimal marketing mix?
These questions are not new. John Wanamaker (1838–1922), considered by some to be a pioneer in marketing had the same questions and is known for his famous and often cited quote:
Half my advertising spend is wasted; the trouble is, I don’t know which half.
To address these challenges econometricians developed multivariate regression techniques known as Marketing Mix Modeling (MMM). A very new tool in this area and currently in its beta version is Facebook’s Robyn. The Facebook Experimental team describes Robyn as
[…] an automated Marketing Mix Modeling (MMM) code. It aims to reduce human bias by means of ridge regression and evolutionary algorithms, enables actionable decision making providing a budget allocator and diminishing returns curves and allows ground-truth calibration to account for causation.
In the following, I’ll give you a quick overview of Robyn’s features and main ideas. Since it is still in beta version, the code and concepts used might change. You can find its latest version here.
Used data set
It was quite tricky to find a suitable open data set for this article so I used Google’s Agreggate Marketing System Simulator (AMSS) to generate a simple data set for this purpose. If you are interested in more details about AMSS you can find their paper [here](https://github.com/darinkist/medium_article_robyn/blob/main/simulated_marketing_data.csv) and if you’d like to use the same data set I used you can find it here.
You will find in the following a description table and plots of the marketing data to get more familiar with the data set.

The used data set is based on weekly data and contains 208 entries. All values are in EUR (€) and no NA values exist. Our target variable is revenue while the other columns are features that can be used to explain it.
Columns ending with a _S are our marketing budget expenses by channel (TV, radio, paid search). Our competitor sales are given by the self explaining column competitor_sales.
A column that is not shown in the description table is the DATE column, representing the corresponding week in the format YYYY-MM-DD.
Figure 1 shows the revenue, its components, competitor sales and marketing expenses over time.

We can clearly see a seasonality (A1) in the revenue as well as a bit of a trend (A2). It can be also seen that the competitor sales follow a very similar pattern like our revenue, that we have gaps in the spring months in radio expenses and a seasonality as well as a trend in Paid Search expenses.
Figure 2 shows the correlation plot of our data set to get a first idea of the relationships between the variables.

We can see a high correlation between competitor sales and our revenue followed by a lower correlation of 0.4 between our expenses in Paid search and our revenue as well as competitor sales.
Now that you are a bit more familiar with the data set, let us now use Robyn.
Set up our MMM project
To get the latest version of Robyn we clone it’s repository to our machine:
git clone https://github.com/facebookexperimental/Robyn
After we cloned the repository, we create a new folder called plots to store the visualizations of the later outcomes.
Your folder structure should now look like this:
Robyn/
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE.md
├── README.md
├── Robyn.Rproj
├── plots
├── source
│ ├── de_simulated_data.csv
│ ├── fb_robyn.exec.R
│ ├── fb_robyn.func.R
│ ├── fb_robyn.optm.R
│ └── holidays.csv
├── website
The main file we are interested in lies in the source folder and is called fb_robyn.exec.R. This is the file where we have to set our configuration and run the wrangling, modeling, Optimization and if needed budget allocation process.
But before we go there, let us have a short look what (modeling) techniques Robyn uses.
Robyn’s techniques
The following points describe what is under Robyn’s hood on a high level. For more detailed information and explanation see their docs.
Ridge regression
The developer’s motivation to use a regularization method was to address multicollinearity among many regressors and prevent the model from overfitting. The model’s equation with the main components of the function is shown in figure 3.

Where yₜ is our dependent variable revenue at time t. The independent variables are defined by the intercept, followed by the ad-stock and s-curve transformed component for each media j. Holiday, seasonality and trend effects are represented by hol, sea and trend. Additional independent variables are defined by ETC followed by the error term ε.
Feature transformation options
Very common transformation techniques in MMM are ad-stock and s-curve (diminishing returns) transformations.
Ad-stock transformationThe idea behind the ad-stock transformation is that advertising effects usually do not immediately take effect. They have a half-life. Customers (usually) do not run instantly to the stores and buy your product after they saw your commercial. Your ads take some time to wear in.
Robyn offers here two methods, the classical geometric one, and the more flexible weibull survival function. For a deeper explanation, please see the docs.
S-curve (diminishing returns) transformationThe basic idea behind this transformation is that an advertisement loses its effectiveness over time, even if more money is allocated to it.
Trend, seasonality and holiday effects
In order to include time-series features or components like trend, seasonality, or holidays in the model, Robyn makes use of Facebooks Prophet.
Automated hyperparameter selection and optimization
Robyn uses the Facebook’s Nevergrad gradient-free optimization platform to perform a multi-objective optimization that balances out the relationship between spend share and channels coefficient decomposition share by providing a set of Pareto optimal model solutions.
These Pareto optimal model solutions are the result of running an evolutionary algorithm (natural selection) over several iterations (i.e., with 20,000 iterations and possible model solutions)
Calibration using experimental results
Robyn allows us to apply results from randomized controlled experiments to increase the model’s accuracy, where these results are used as a prior to shrink the coefficients of media variables.
This part is not covered in this article. If you are interested in more details, you can find their documentation here.
MMM Configuration
Now that we a basic overview of Robyn’s techniques let’s continue with our use case. You will find at the end of this article the complete code.
We run the Robyn.Rproj with R-Studio and open the fb_robyn.exec.R located in the source folder.
Avoid errors with time series features
The first thing you should do if your OS is not English to uncomment line 13:
Otherwise you will get errors in the data wrangling process (line 166).
Install and load libraries
Make sure to install all the used libraries, create a conda environment called r-reticulate, install nevergrad and use the created conda env.
Load data
Now it is time to load our csv file. The dev team already provides a file called de_simulated_data.csv. Since I am using my own simulated file I change the line.
The dev team also provides a holiday file which includes public holidays from several countries (i.e. US, UK, IN, DE).
Set model input variables
In this part, we link the configuration in the code to the columns in our data set. This part is very crucial since typos will lead to errors during the automated data wrangling process.
Set global model parameters
Robyn allows us to either use the geometric or weibull adstock technique. In this article, we stay with the geometric one, but it is definitely worth trying out the weibull technique.
Since Robyn uses Nevergrad, we have to select an algorithm as well as the number of trials. Here we also stick to the default ones.
Set hyperparameter bounds
According to our defined variables and used ad-stock method, we have to set their hyperparameter bounds.
Run the models
Now that we have all parameters set, we can run our model by using the following code.
Robyn is now running and will automatically generate plots in our specified plots folder.
Output and diagnostics
After the modeling process is done, you should find several files in your plots folder.
These plots (with the model id as their name) represent the optimal model solutions based on the Pareto optimal process (pareto_front.png) and provide us additional information about hyperparameter selection (hypersampling.png).
Figure 4. shows the model one-pager for one of these models (3_30_1).

Response decomposition waterfall by predictor Shows the percentage of each of the features effect on the response variable (revenue). In this example 34.08% of the revenue can be attributed to seasonality and 11.62% to TV commercials and so on.
Share of spend vs. share of effectThis plot describes the share of spend and the share of effect by channel. In addition to that, it also shows the return on investment (ROI) of each channel. For our example, we can see that the channel radio has the highest ROI followed by TV and Paid search. It also shows that the average expenses on TV are a bit larger than their average effect share. This likely means that they are hitting some diminishing returns.
Average ad-stock decay rateShows each channel’s percentage decay rate. The higher the decay rate, the longer the decay effect lasts. For this example, the channel TV has the highest average decay rate.
Actual vs. predicted responseThis plot compares the actuals with our prediction. Our aim is that our model can explain most of the variance in the data. Therefore we are looking for a high R-squared (rsq) and a low NRMSE.
Response curves and mean spend by channelIndicates the saturation of each channel and may suggest potential budget reallocation strategies. Looking at the curves, the faster they reach to an inflection point and to a flat slope, the quicker they will saturate with each additional € spent. For our example, the curves for radio and paid search do not show any flat slope and may require further investigations.
Fitted vs. residualThe classic chart to check if there are any problems with our regression model.
Simulate Budget allocation
Once we decided on a reasonable(!!) model, we can run budget optimization simulations to calculate the optimal media spend allocation. We have two types of simulation scenarios:
- max_historical_response
- max_response_expected_spend
The first one will use the historical data and spend that contributed to the model to calculate how the most optimized media plan would be.
The second one calculates the optimal media plan given a certain budget and a number of days.
All the models are stored in the _model_outputcollect$allSolutions variable. Assuming we want to go with the model above (3_30_1) we just use the following code:
Besides the model id and scenario type we also have to set lower and upper bounds for our used channels. If the lower bound of our channel is 0.7 and the upper bound is 1.2 then our spend change will be constrained by 0.7 times the average time period spend and 1.2 times the average time period spend.
After we run that code Robyn outputs a new one-pager (figure 5).

The two left charts show how the budget allocation and mean response would change if we would follow the budget optimization. The chart on the right shows the response curves for each channel with the initial vs. recommended average spend level.
The idea of this article was to provide you a first overview of Facebook Experimental’s Robyn and how to use it by using a simplified data set. For deeper explanations, the use of experimental results, and more details, I highly suggest you to check out Robyn’s documentation.
Unlike in this short introduction, it is also important to really focus and invest time on the model selection part. Only with a reasonable model, it makes sense to discuss the results with the business and to use the budget optimization step.
Even if the project is still in beta, the motivation and ideas behind it and its features are very clever and quite impressive!
If I could make three wishes, I would love to have an option to use panel regression for cross-sectional data. Second, it would be also great to find a way to decrease the computational time. Third, similar to the first wish, using categorical variables not only as baseline variables would be nice. …and if I could make a fourth one… a port to Python would be great 😉
The code for the complete file: