1+1=? Better decision-making when causal inference meets machine learning

Double Machine Learning: A general framework for estimating causal effects using machine learning (ML) methods

HenAI
Towards Data Science

--

In recent years, machine learning has been applied to different domains such as online marketing and commerce, personalized medicine, and data-driven policy-making. This dramatic success has led to increased expectations for autonomous systems to make the right decision. Companies often reach for classical Machine Learning tools to solve decision-making problems, such as where to set the price or which customers to target with a marketing campaign. This gives rise to one of the major challenges of machine learning today that is the understanding of the cause-effect connection.

Decision-Making Questions Need Causality

To make a data-driven decision, the understanding of causal relationships is key. Here is a simple example from our daily business: Price Elasticity. To set an optimal price for products, the company needs to know how much it will sell at different (hypothetical) price levels, which is the price elasticity of demand.

If you are an ML practitioner, a classical ML algorithm that predicts sales as the outcome with the price level as a feature might be your go-to. However, in practice, this approach does not simply give us the causal effect of price on sales. The following plot shows “why” — Prediction and causal inference are distinct (though closely related) problems. (Athey, 2017, p. 484) On the left-hand side, it is a prediction problem, we are finding the correlation between price and quantity. On the right side, The dotted lines are the counterfactuals, i.e. what’s the sales for a given product if I changed their price. Despite the positive association, the causal effect is negative. If we increase the price, quantities will decrease.

Prediction (Left) V.S. Causal Inference

As the above example and many other examples described in Athey, S. (2017). Beyond prediction: using big data for policy problems. Science 335, 483–485, Machine Learning models are not built to estimate causal effects. Applying off-the-shelf prediction methods from Machine Learning leads to biased estimates of causal effects. On the other hand, traditional causal inference requires strong assumptions about the functional form of the model. If we misspecify the functional form, we will end up with biased estimates. Therefore, The existing Machine Learning techniques can be modified to use the advantages of Machine Learning for learning the form of the conditional expectation function from the data — the birth of Double Machine Learning!

A general framework for the best of both worlds

There are many researchers developing methodology in the area of causal machine learning. I particularly interested in double machine learning because of its generality and simplicity. It can be used in conjunction with penalized methods, neural networks, trees algorithm, and ensemble methods and it is easy to operationalize. Let me show you how easy it is:

Given that we have sets of products with important features X, we use P to represent price, and Y is the demand response(sales). We then:

  1. Regress Y on X, and compute the difference between Y and the predicted values of Y from the model (i.e., the residuals) which we’ll call Y_res.
  2. We similarly regress P on X and compute P_res, the difference between P and the predicted values of P from the model.
  3. Finally, we regress Y_res on P_res. The resulting coefficient on P_res is the point estimate for the causal effect of P on Y.

P.S. In order to get an unbiased estimate. You need to apply cross-fitting in each step. Be more specific, you should: 1) Randomly partition your data into two subsets 2) Fit two ML models in the first subset 3) estimate coefficient in the second subset using the models we fit in the first subset 4) Go over 1 to 3 but flip the subsets in each step 5) average the coefficient, which will be the unbiased estimate

Isn’t that simple and straightforward?! If you still feel not confident about how to implement it, I recommend an awesome python package called econML. It includes many cutting edges causal machine learning works in the package and double ML is one of them. Here is a code snippet from the package:


est = LinearDMLCateEstimator(model_y=MultiTaskElasticNetCV(cv=3, tol=1, selection='random'),
model_t=MultiTaskElasticNetCV(cv=3),
featurizer=PolynomialFeatures(1),
linear_first_stages=True)
est.fit(Y, T, X, W)
te_pred = est.const_marginal_effect(X_test)
# Reference: https://github.com/microsoft/EconML/blob/master/notebooks/Double%20Machine%20Learning%20Examples.ipynb

where Y is the outcome, T is treatment, X is the feature, W is a cofounder. This package allows you to embed different machine learning models to its double ML class, which is super handy.

In this post, I’ve covered some basic concepts for causality and machine learning. Moreover, I have introduced a generalized ML framework to estimate the causal effect. You can apply this method to your work or other data science project to help you get the best controls to learn the right answer faster.

I am planning to share more on the theoretical side of this method so you can have a better understanding of why and how you can advance this method.

Reference:

  1. ALICE (Automated Learning and Intelligence for Causation and Economics) — Microsoft Research
  2. Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments

--

--