MR-Uplift: Multiple Responses in Uplift Models

A package to build Uplift (or heterogeneous treatment effects) models that builds and evaluates tradeoffs with multiple responses

Sam Weiss
Towards Data Science

--

Introduction

Originally posted in Building Ibotta

At Ibotta we have built and deployed multiple uplift models over the past few years. We have described some theoretical groundwork and practical advice in blogs before. In these we discussed how to evaluate them, how one can use specific loss functions to estimate them, and how to evaluate tradeoffs between several response variables.

With this hard-earned knowledge and advances in estimation and evaluation we have decided to open source a package in Python called MR-Uplift.

This post will go over what this package can do and how it can be a useful addition to existing packages out there. An example is presented and further resources are provided.

Package Contributions

Uplift modeling (also known as heterogeneous or individual treatment effects) is a branch of machine learning that learns the causal relationship between a treatment t and a particular response y for an individual x. An example might be the future activity of a particular individual that received a particular advertisement.

While there are several packages out there to build uplift models (see GRF and Causal Tree), they generally rely on the single treatment and single response case. They generally do not give estimates of how the model will perform in production. In addition, there is no support to estimate the tradeoffs between several response variables.

This package attempts to build an automated solution for uplift modeling that fills the needs of Ibotta’s use cases with the following features:

  1. MR-Uplift allows for multiple treatments. This means that a user can input an arbitrary number of treatments (not just the usual binary variable) into the model. In addition one can incorporate meta features for each treatment. For example, a particular treatment might have several shared features with other treatments. Instead of creating a dummy indicator for each treatment, the user can create a vector of categorical or continuous variables to represent the treatment. Specifying the treatment variable this way allows the model to take into account similarities between treatments.
  2. Few current packages give out-of-sample (OOS) estimates on the expected performance of a model if it were deployed in production. MR-Uplift includes functionality to estimate this using the ERUPT metric. This metric calculates the expected response if the model were given to the average user.
  3. Support for multiple responses. This is perhaps the most unique feature of this package (and is the reason for its current name). Instead of having one response the model has capability to build a multi-output model. With this, one can estimate tradeoffs between the response variables. We have found this to be a useful tool in determining what possibilities there are available to us and which objective function to use in production.

MR-Uplift uses a multi-output neural network with a mean-squared error loss. While any kind of multi-output model can be used I have found that neural networks can estimate the interaction effects of the treatment and explanatory variables more reliably than tree based methods.

One particular drawback of this work is that it doesn’t measure the treatment effect directly like Causal Tree. Expanding the model to incorporate more specific loss functions could lead to improved performance. In addition it is assumed there is a random control trial for the treatment data, as there is currently no support for estimating treatment effects from observational data.

Quick Example

If you’re new to uplift models I suggest going through the examples provided in the repo. Below is an example that generates data, builds a model, and evaluates tradeoffs. Please see this notebook for more detail on this hypothetical example.

Suppose we are marketers at a business that would like to increase user revenue with an advertisement (known as the treatment). The data consists of a randomly assigned treatment variable with four response variables for each user: revenue, cost, profit, and a random noise variable. The treatment increases user revenue, but at a cost, both of which are functions of the explanatory variables. We are interested in estimating the tradeoffs of response variables available to the business. Below is code that generates data and builds a model.

y, x, and t are the response, explanatory, and treatment variables, respectively. Note that each is assumed to be a numeric type (one-hot encoding is required for categorical variables), but can be multiple columns wide.

import numpy as np
import pandas as pd
from dataset.data_simulation import get_simple_uplift_data
from mr_uplift.mr_uplift import MRUplift
#Generate Data
y, x, t = get_simple_uplift_data(10000)
y = pd.DataFrame(y)
y.columns = ['revenue','cost', 'noise']
y['profit'] = y['revenue'] - y['cost']
#Build / Gridsearch model
uplift_model = MRUplift()
param_grid = dict(num_nodes=[8],
dropout=[.1, .5],
activation=['relu'],
num_layers=[1, 2],
epochs=[25],
batch_size=[30])
uplift_model.fit(x, y, t.reshape(-1,1), param_grid = param_grid, n_jobs = 1)

This automatically applies a train / test split, applies a z-scale to all variables and builds a multi-output neural network of the form y~f(t,x).

To see how the model performs and the tradeoffs available to the business, the modeler can apply the get_erupt_curves() function to obtain out-of-sample ERUPT curves. These estimate the tradeoffs by applying various weights to each of the predicted responses and calculating the corresponding expected responses.

For example, suppose we have have a weight β and objective function β*revenue-(1-β)*costs. A weight of β=1 corresponds to revenue maximization, β=0 cost minimization, and β=.5 to be profit maximization. Below shows the code and plotted output of expected responses under various weights of β.

erupt_curves, dists = uplift_model.get_erupt_curves()

The above are ERUPT curves that vary the objective function between cost minimization to revenue maximization. As the weight β increases there is increased costs and revenue.

Note that the noise response stays approximately around zero which makes sense since the treatment has no effect on it. Also note that the profit response shows a maximum at β=.5 as expected.

There is also a ‘random’ assignment curve. This ‘shuffles’ the optimal treatment and then calculates ERUPT. Comparing the ‘model’ line vs. the ‘random’ line shows how well the model allocates treatments at an individual level.

In practice, I have found these charts useful in deciding what the tradeoffs should be. Stakeholders might not know what the relative tradeoffs should be beforehand between more ambiguous response metrics. These charts display available options to stakeholders and set expectations for what the model can and cannot achieve in production.

Getting Started

To get started with MR-Uplift checkout the github repo and examples or pip install mr_uplift. Please submit any issues if you find any. Better yet, you can contribute if you have ideas you’d like to see implemented.

--

--