How to do Causal Inference using Synthetic Controls

Between 2017 and 2021, there has been a ton of research on synthetic controls. However, the method has not been adopted widely by the data science industry.

Figure 1: first published example of a synthetic control that shows the impact of a terrorist attack on the Basque country GDP - src. Image by author. — Figure 1: first published example of a synthetic control that shows the impact of a terrorist attack on the Basque country GDP – src. Image by author.

In one sentence, synthetic controls (SC) forecast what observed data would have looked like without a treatment. SC’s are attractive because they allow for casual inference on observed time series data. They’re also very computationally efficient and relatively simple.

In this post we’ll discuss synthetic controls and a new variant developed by researchers at MIT for running t-tests on synthetic control data. It boasts some impressive advantages over traditional methods, for instance a 50% decrease in confidence interval size. It’s also computationally efficient and assumption-lean.

Without further ado, let’s dive in…

Technical TLDR

The synthetic control method develops an estimate of a control group in observational time series data. This allows us to isolate the treatment effect of an intervention where randomization is not possible.

We start by selecting a control group that is similar to our treatment group. We then in the pre-intervention period use a weighted average to minimize the distance between covariates in the control and treatment groups. Finally, we use the "trained" covariates in the control to extrapolate what would have happened had the treatment not occurred.

However, inference for SC’s is not robust due to potentially incorrect weight vectors and poor estimates of long-run variance. To combat these problems, we outline a scale-free t-statistic that is fit using a K-fold cross fitting process. The only assumption requires is that the control group is "sufficiently" like the treatment.

Here’s the R package repo.

But, what’s actually going on?

Ok you just had a lot thrown at you. Let’s slow down a bit and work through synthetic controls and MIT’s t-test for inference.

1 – Causal Inference

In statistics we are often interested in causal relationships.

The gold standard for determining whether a relationship is causal is an A/B test. By randomly assigning between treatment and control, we are guaranteed to (on average) evenly distribute covariates between groups as shown in figure 2. Randomization effectively makes experiment groups identical.

Once we have identical groups, we can assign a treatment to one and observe the differences. Since the groups are identical, the differences must causally be due to the treatment.

Figure 2: example of an evenly distributed covariate, sex. Image by author. — Figure 2: example of an evenly distributed covariate, *sex*. Image by author.

However, we’re also interested in developing Causal Inference for data we observe but didn’t randomize. One naive approach would be to develop a perfect causal model that takes into account all possible confounders. But how can we know if we have all confounders?

You often can’t. The assumption of a "correct" model is the base of most simple causal models and is fundamentally unprovable.

There are many alternatives, but in this post we’ll be discussing one of the most robust methods specifically for time series data…

2 – Synthetic controls

Synthetic controls are a very clever way to create a control group in observed data. By doing so, we have a baseline to compare to and can thereby accurately estimate the treatment effect.

Let’s work through an example.

Let’s say we’re looking to evaluate the impact of a clean energy policy in the city of Philadelphia. Our variable of interest will be the number of solar panels installed on houses.

Ideally, our treatment effect would look something like figure 3 below. After the intervention, the ratification of the policy, we’d see an increase in the number of solar panels (blue) relative to our theoretical control (green dotted).

Figure 3: example time series structure of a synthetic control. Image by author.

With that framework, let’s go through the steps for developing a synthetic control…

Step 1: select a control group. The control group is used to train a weight vector that predicts the synthetic control values. Note that the control group cannot be influenced by the treatment in any way. For our example, we may want to exclude cities that do business with Philadelphia because fossil fuel exports could change.

Step 2: determine relevant predictors. Predictors should be observable in the both the treatment and control group. They can be anything that isn’t systematically impacted by the treatment. Some examples could be the GDP of a city or the total number of houses.

Step 3: fit a weight vector. The selected weights must minimize the distance between the predictors in the treatment and control groups, as shown in figure 4. The X1 and X0 are the same predictors, but observed in the treatment and control respectively. W is our weight vector.

Figure 4: norm vector we are looking to minimize. X's are predictors and W is a weight vector. — Figure 4: norm vector we are looking to minimize. X’s are predictors and W is a weight vector.

At the end of this step, we have a set of weights that make the control covariates look like the treatment covariates.

Step 4: forecast our control values. Armed with the knowledge of how control predictors relate to treatment predictors, we can use the observed control data to predict the treatment had the intervention not taken place.

Step 5: observe the treatment lift. With a control and a treatment group, we can begin exploring the differences.

Not too bad, right?

The Method: T-Tests for Synthetic Controls

Now vanilla synthetic control frameworks can’t determine variation in the estimate, so we can’t determine statistical significance.

One common workaround is to run a permutation test, but here we outline an alternative method developed at MIT. The method impressively only requires the assumption that "treatment units are sufficiently similar to control units."

The method combats two main problems when estimating uncertainty of synthetic controls. The first is that our weighted average framework may produce an inaccurate estimate, even when the distance between our treatment and control predictors are minimized. The second is more nuanced, but still important. When determining statistical significance, we often use estimators of long-run variance to define confidence intervals. Those long-run variance estimators are often inaccurate.

It’s going to get a bit technical for brevity, but feel free to refer to the paper.

Problem 1 – Inaccurate Weight Vector

To solve the problem of potentially inaccurate weight vector estimates, we perform a K-fold cross-fitting process, outlined below…

Here, T0 and T1 are all time points in the pre and post-intervention periods respectively. K is a user-defined parameter that determines the number of folds, similar to cross validation.

Note that this notation is very minimal but hopefully provides a general understanding. See the beginning of section 2.2 for more.

Just one more section to go…

Problem 2 – Inaccurate Long-Term Variance Estimators

To solve the problem of inaccurate long-term variance estimators, we develop a scale-free statistic.

Despite the seemingly complex notation, this is simply a t-statistic. The value in the parentheses is the difference between our treatment mean and control mean. The denominator is the standard deviation determined through our k-fold fitting process. And, K is the number of folds.

Summary

And there you have it, a summary of synthetic controls and how to implement a T-test on these information.

To recap, synthetic controls use the relationship between treatment and control groups observed prior to an intervention to forecast the control group’s value after the intervention. Despite the method’s popularity, it struggles to do robust statistical inference.

So, we outlined a t-test that leverages k-fold cross-fitting as well as a scale-free t-statistic to combat common issues found when developing synthetic controls. Furthermore, this method impressively only requires the assumption that units in our control are similar to units in our treatment.

Implementation Notes

Theoretically the SC framework extends beyond linear relationships and a simple weighted average. However, the math becomes more complex, so there hasn’t been much development in the area. Here’s one notable exception.
When fitting the covariates, it’s important to assess the quality of the fit. A severely incorrect model can dramatically decrease the quality of our conclusions. Here’s a bias correction method for poorly fitting models.
To simplify the process without sacrificing fit, it’s common to apply PCA to the predictors in treatment and control. Then, a weight vector is fit on each of the principle components.
SC’s rely on large pre-intervention periods to train. The longer the pre-intervention period, the better.
It’s often a good idea to standardize predictors when minimizing weights. This prevents predictors with different sizes/units from skewing the minimization.
Difference in difference is a common alternative, but requires more assumptions. If any of those assumptions are not supported, SC’s and this t-test are a good default option.

P.S. there’s lots of really good research coming out. Check out arxiv.com to stay up-to-date.

Thanks for reading! I’ll be writing 32 more posts that bring academic research to the DS industry. Check out my comment for links to the main source for this post and some useful resources.