The world’s leading publication for data science, AI, and ML professionals.

Of Counterfactuals and Hypotheticals…

Understanding the intuition behind Synthetic Control Method – Its uses, advantages and limitations.


Photo on MediaPRO
Photo on MediaPRO

As humans, we always tend to create hypothetical scenarios in our head. We usually measure the impact of any decision by comparing it with an imaginary timeline where an opposite decision was made. Hence, it was not long before policymakers started to measure the performance of policy interventions by coming up with several "What if?" scenarios. One such technique that works by creating such a counterfactual scenario to estimate the effect of policy interventions is the Synthetic Control Method (SCM). Developed by Abadie and Gardeazabal, the Journal of Economic Perspectives described it as "arguably the most important innovation in the policy evaluation literature in the last 15 years".

But to understand this better let’s look at a case study –

1. Prop-99

In 1988, California adopted a tobacco-control program named Proposition-99. By 1990, tobacco consumption had decreased significantly amongst Californians. A naive explanation for this sudden decrease could be attributed to the programme. But when interventions of such scale are performed, rarely there is one reason for the outcome. A more sceptic mind would highlight other factors –

  1. Smoking was already in decline in California even before the policy was enacted.
  2. Greater focus was given to the health education programme and other such activities, which could have helped in the steady decrease.

Thus, a simple before & after story will not help us to fully understand the effect of ‘Prop-99’. What would help us is to create an alternate scenario in which Prop-99 was never implemented — and synthetic control does exactly that. A formal definition of it can be given as follows –

It is a statistical method to evaluate treatment effect in comparative case studies. It creates a synthetic version of treated units by weighting variables and observations in the control group.

Let’s break this down and apply it to our case:

1.First, a pool of potential candidates is chosen that can act as the control group (i.e other states where no such tobacco control programme was enacted).

2.Out of those potential candidates, a suitable donor pool is formed by matching the pre-intervention outcomes and other variables that directly affect the outcome (i.e cigarette sales) to the same variables in the pre-intervention period of the target region (i.e California).

3.This donor pool + target region is used to make a ‘synthetic state’ that closely resembles California’s tobacco consumption before the policy intervention and acts as a control for the target region after the enactment.

The final output can be represented in a graphical form like this:

[1] Trends in per-capita cigarette sales: California vs. the rest of the United States
[1] Trends in per-capita cigarette sales: California vs. the rest of the United States

Now, using this technique we can answer the sceptic’s questions asked before.

  1. As our donor pool consisted of states where no such policy intervention was enacted, there is very little chance of extrapolation to justify the outcome variable in our counterfactual.
  2. As the synthetic states were made from the weighted sum of donor region selected from a pool of candidates, there is no bias in selecting the control state, which can be one of the drawbacks of the traditional difference-in-differences method.
  3. The validity of the model is reflected on how closely the synthetic state follows the outcome trend line of the treatment state before the policy intervention.

2. When to use?

  1. From the above case, it easy to say that the implementation of SCM usually happens at an aggregate level: countries, states, districts etc.
  2. It is applied to regions where no suitable comparison exists. For eg, if A and B are two units and A has enacted some intervention. We can compare the effects of that intervention on group B and find the counterfactual of interest. However, in the absence of group B, we use SCM.
  3. Only a single treated case and a few control cases.

3. Limitations

It’s easy to see why we would prefer using this technique to create a model that can estimate the effect of policy intervention at an aggregate level. But it comes with its own limitations as well.

One of the main challenges of SCM is to select a suitable donor pool in the absence of covariate information — especially for a problem which may requires a lot of domain expertise. Hence researchers at MIT developed Robust Synthetic Control (RSC) which can be interpreted as a more generalized approach towards SCM –

The main advantages of using RSC over SCM are –

  1. De-noising the data matrix via singular value thresholding; which makes it as the name suggests, ‘robust’ on multiple factors.
  2. It Automatically selects a suitable donor pool, hence we only need data about the outcome variable in situations where not enough covariate information may be available.
  3. Hence, domain expertise while using RSC almost becomes a luxury rather than a necessity.

For more reading, I would recommend you check out the following links-

  1. https://www.urban.org/research/publication/synthetic-control-method-tool-understand-state-policy
  2. http://peerunreviewed.blogspot.com/2019/11/a-short-tutorial-on-robust-synthetic.html

References –

[1] Alberto Abadie, Alexis Diamond, Jens Hainmueller. Journal of the American Statistical Association. June 1, 2010, 105(490):493–505. doi:10.1198/jasa.2009.ap08746.


Written By

Topics:

Related Articles