Understanding Causal Inference with Synthetic Control method and implementing it in Python
Understanding Synthetic Control with an example

What is Synthetic Control?
Synthetic Control has been described as the "most important development in program evaluation in the last decade" (Atheyand Imbens 2016). The synthetic **** control method is a statistical method used to evaluate the effect of an intervention in comparative case studies. It involves the construction of a weighted combination of groups used as controls, to which the treatment group is compared. This comparison is used to estimate what would have happened to the treatment group if it had not received the treatment. It is based on a simple, yet powerful idea. We don’t need to find any single unit in the untreated group, similar to the treated group. Instead, we can forge our own as a combination of multiple untreated units, creating what is effectively a synthetic control.
Unlike difference in differences approaches, this method can account for the effects of confounders changing over time, by weighting the control group to better match the treatment group before the intervention. Another advantage of the synthetic control method is that it allows researchers to systematically select comparison groups. It has been applied to the fields of political science, health policy, criminology, and economics.
In this article, we are going to focus on understanding the specifics of synthetic control and its implementation in Python with an example. Before I start, I want to acknowledge that this article is based on the content of Causal Inference for The Brave and True. This opensource book helped me immensely in giving me a deeper understanding of various Causal Inference methods.
Our example will consider the problem of estimating the effect of cigarette taxation on its consumption. To give a bit of context, this is a question that had been debated for a long time in economics. One side of the argument says that taxes will increase the cost of cigars, which will lower its demand. The other side argues that since cigarettes cause addiction, change in their price won’t change their demand by much. In economic terms, we would say that the demand for cigarettes is inelastic on price, and an increase in taxation is just a way to increase government income at the cost of smokers. To settle things, we will look at some US data regarding the matter.
Data Used
In 1988, California passed a famous Tobacco Tax and Health Protection Act, which became known as Proposition 99. "Its primary effect is to impose a 25-cent per pack state excise tax on the sale of tobacco cigarettes within California, with approximately equivalent excise taxes similarly imposed on the retail sale of other commercial tobacco products, such as cigars and chewing tobacco. Additional restrictions placed on the sale of tobacco include a ban on cigarette vending machines in public areas accessible by juveniles and a ban on the individual sale of single cigarettes. Revenue generated by the act was earmarked for various environmental and health care programs, and anti-tobacco advertisements."
To evaluate its effect, we can gather data on cigarette sales from multiple states and across a number of years. In our case, we got data from the years 1970 to 2000 from 39 states. Other states had similar Tobacco control programs and were dropped from the analysis. Here is what our data looks like.

We have state
as the state index, where California is the number 3. Our covariates are the cigarette retail price, and, the per-capita sales of cigarettes in packs. Our outcome variable of interest is cigsale
. Finally, we have boolean helper variables to signal the state of California and the post-intervention period. If we plot the sales of cigarettes for California and the mean sales of cigarettes in other states across time, this is what we would get

During the time for which we have data, people in California apparently bought fewer cigarettes than the national average. Also, it appears to be a decreasing movement in cigarette consumption after the 80s. It looks like after Proposition 99 the decreasing trend accelerated for California, compared to other states, but we can’t say that for sure. It is just a guess that we have by examining the plot.
To answer the question of whether Proposition 99 had an effect on cigarette consumption, we will use the pre-intervention period to build a Synthetic Control. We will combine the other states to build a fake state that resembles very closely the trend of California. Then, we will see how this synthetic control behaves after the intervention.
Mathematical Notations
Suppose that we have J+1 units. Without loss of generality, assume that unit 1 is the unit that gets affected by an intervention. In our case, California is the case affected by intervention or Proposition 99. Units j=2,…, J+1 is a collection of untreated units or states that we will refer to as the "donor pool". Also assume that the data we have span T time periods, with _T_0 periods before the intervention. For each unit j and each time t, we observe the outcome Yjt. For each unit j and period t, define YNjt as the potential outcome without intervention and YIjt, the potential outcome with intervention.

Then, the effect for the treated unit j=1 at time t, for t>_T_0 is defined as

Since unit j=1 is the treated one, YIjt is factual but YNjt is not. The challenge then becomes how do we estimate YNjt. Notice how the treatment effect is defined for each period, which means it can change in time. It doesn’t need to be instantaneous. It can accumulate or dissipate. To put it in a picture, the problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of unit j=1 if it had not been treated.

To estimate YNjt, we remember that a combination of units in the donor pool may approximate the characteristics of the treated unit much better than any untreated unit alone. Thus, a synthetic control is defined as a weighted average of the units in the control pool. Given the weights W=(_w_2,…,wJ+1) the synthetic control estimate of YNjt is

Visual Explanation
As we know, linear regression is also a way of getting the prediction as a weighted average of the variables. In this case, a regression can be represented as the following matrix multiplication

On the synthetic control case, we don’t have lots of units, but we do have lots of time periods. So what we do is flip the input matrix around. Then, the units become the "variables" and we represent the outcome as a weighted average of the units, like in the following matrix multiplication.

If we have more than one feature per time period, we can pile up the features like this. The important thing is to make it so that the regression is trying to "predict" the treated unit 1 by using the other units. This way, we can choose the weights in some optimal way to achieve this proximity we want. We can even scale features differently to give different importance to them.

Implementation
To estimate the treatment effect with synthetic control, we will try to build a "fake unit" that resembles the treated unit before the intervention period. Then, we will see how this "fake unit" behaves after the intervention. The difference between the synthetic control and the unit that it mimics is the treatment effect.
To do this with linear regression, we will find the weight using OLS. We will minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period.
To do so, the first thing we need is to convert the units (in our case, the states) into the columns and the time into the rows. Since we have 2 features, cigsale
and retprice
, we will pile them on top of each other as we did in the picture above. We will build a synthetic control that looks a lot like California in the pre-intervention period and see how it would behave in the post-intervention period. For this reason, it is important that we select only the pre-intervention period. Here, the features seem to be on a similar scale, so we are not normalizing them. If features are in different scales, one in the thousands and another in the decimals, the bigger feature will be the most important when minimizing the difference. To avoid this, it’s important to scale them first.

Now, we can define our Y variable as the state of California and the X as the other states.
Then, we run a Lasso regression. We use Lasso or L1 regression because we don’t want our state data to overfit. Ridge regression can also be used for this. The regression will return the set of weights that minimize the square difference between the treated unit and the units in the donor pool.

These weights show us how to build the synthetic control. We will multiply the outcome of state 1 by 0.566, of state 3 by 0.317, of state 4 by 0.158, and so on. We can achieve this with a dot product between the matrix from the states in the pool and the weights.
Now that we have our synthetic control, we can plot it with the outcome variable of the State of California.

With the synthetic control at hand, we can estimate the treatment effect as the gap between treated and the synthetic control outcomes.


By the year 2000, it looks like Proposition 99 has reduced the sales of cigarettes by 25 packs. Now we will figure out if this is statistically significant.
Inference
Here, we will use the idea of Fisher’s Exact Test. Its intuition is very simple. We permute the treated and control exhaustively. Since we only have one treated unit, this would mean that, for each unit, we pretend it is the treated while the others are the control.

In the end, we will have one synthetic control and effect estimates for each state. So what this does is it pretends that the treatment actually happened for another state, not California, and see what would have been the estimated effect for this treatment that didn’t happen. Then, we see if the treatment in Califórnia is sufficiently larger when compared to the other fake treatment. The idea is that for states that weren’t actually treated, once we pretend they were, we won’t be able to find any significant treatment effect.
This function returns a data frame with one column for the state, one for the year, one for the outcome cigsale
and the synthetic outcome for that state.
Here is the result when we apply it to the first state.

With the synthetic control for all the states, we can estimate the gap between the synthetic and the true state for all states. For California, this is the treatment effect. For the other states, this is like a placebo effect, where we estimate the synthetic control treatment effect where the treatment didn’t actually happen. If we plot all the placebo effects along with the California treatment effect, we get the following figure

Two aspects of this figure jump to the eyes. First, we can see that the variance after the intervention is higher than the variance before the intervention. This is expected since the synthetic control is designed to minimize the difference in the pre-intervention period. Another interesting aspect is that there are some units we can’t fit very well even in the pre-intervention period. This is also to be expected.
Since those units are so poorly fit, it is a good idea to remove them from the analysis. One way to do it objectively is to set a threshold for pre-intervention error

and remove those units with high errors. If we proceed like this and plot the same figure, this is what we get.

Removing the noise, we can see how extreme of a value is the effect in the state of California. This image shows us that if we pretend the treatment had happened to any other state, we would almost never get an effect so extreme as the one we got with California.

The treatment effect for California in the year 2000 is -31.419, meaning that the intervention reduced the consumption of cigarettes by almost 31 packs.
Conclusion
Using pre-period data from other states, we built a lasso regression model that assigned fixed weights to each control state and arrived at a weighted average that closely resembled California smoking activity before Proposition 99 was introduced.
After this, we used the resulting lasso regression model to synthesize what California would have looked like in post-period, too (absent treatment). The difference between the actual cigarette sales and the synthesized outcome was our treatment effect.
We also saw how we could use Fisher’s Exact Tests to do inference with synthetic control. We assumed that the non-treated units were actually treated and computed their effect. These were the placebo effects: the effects we would observe even without treatment. We used those effects to visualize and check the treatment effect on California cigarette sales.
References
Causal Inference for The Brave and True by Matheus Facure