Causal inference 101: difference-in-differences

Published in

Towards Data Science

5 min readApr 24, 2018

Case study: who pays for mandated childbirth coverage?

In today’s Public Finance III lecture @ Stanford, Professor Petra introduces one of the most widely used causal inference technique: difference-in-differences (diff-in-diff). To make our discussion less dry, she motivates the need for this cool technique in the context of mandated benefits.

The Question

When government mandate employers to provide benefits, who is really footing the bill? Is it the employer? Or is it the employee who pays for it indirectly in the form of a pay cut?

In this lecture, Professor Persson answers this question quantitatively, using a cool technique called “difference-in-differences-in-differences”.

To make our discussion more tractable, we focus on one case study: mandated health coverage of childbirth. This analysis is first conducted by Jonathan Gruber in 1994, an MIT Professor who serves as the director of the Health Care Program at the National Bureau of Economic Research (NBER). To date, The Incidence of Mandated Benefits remains one of the most influential paper in healthcare economics. As of April 24, 2018, it is cited by 1,148 other academic papers.

Timeline: Mandated Health Care Coverage of Childbirth

Understanding the timeline is important for identifying the causal effect:

Before 1978: there was limited health care coverage for childbirth.
1975–1979: a subset of states passed laws, mandating the health care coverage of childbirth.
Starting in 1978: federal legislation mandates the health care coverage of childbirth for all states.

Ask Data: did women pay for the benefit indirectly via a pay cut?

First Attempt

The first step is to compare young, married women’s wages, before and after the mandate:

Numbers in cells: log hourly wages. Numbers in parenthesis: standard deviation

We see there was 3.4% percent fall in the real wages of women after the mandate. The effect is statistically significant.

Can we conclude that women paid indirectly via a 3.4% pay cut?

Not yet! For the 3.4% to be the true effect, we need to make the assumption that the mandate was the only thing that could have affected wages during this period. But this is likely not the case.

Second Try

What if during this transition, the nation as a whole slipped into a recession? Then all young, married women’s real wages would have been cut anyway.

To address this concern, we look at the change to young, married women who live in states that have not passed the mandate yet. The key assumption is that: states that have yet passed the mandate provide a good counterfactual. It helps us understand the what would have happened to young, married women’s wages had their states not passed the mandate. In academic term, this is referred to as the “parallel trend assumption”. You can’t test this assumption, but you can see how likely it holds by plotting the pre-trend — whether young, married women’s real wages change in parallel in states with and without the mandate.

Source: Columbia University Population Health Methods, and Gruber (1994)

In states that have yet passed the mandate, young, married women’s real wages actually increased by 2.8%! So in states that passed the mandate, young, change to married women’s real wage is -3.4% -2.8% = -6.2%. This -6.2% estimate is the “diff-in-diff” estimate.

But we’re not done yet.

Third Time’s A Charm

But what if the parallel trend assumption doesn’t hold? For example, it is conceivable that there’s some X-factor that went into effect during the same period as the mandate on childbirth coverage. Then the 6.2% change in real wage might be attributed to this X-factor, not the childbirth coverage mandate!

This seems like an impossible critique to address. But we’re not doomed.

Triple-diff to the Rescue!

To check this possibility, let’s repeat the diff-in-diff analysis on a population that’s unaffected by the childbirth coverage: men.

The idea is that the said X-factors would affect everyone during this time period; but the childbirth coverage mandate would only affect young, married women. If the diff-in-diff estimate for men is -6.2%, then we should attribute all changes in real wage to these X-factors, not childbirth mandate.

Luckily, the diff-in-diff estimate for men is -0.8%, and statistically insignificant. But for completeness’s sake, we will still compute the triple-diff estimate = (-6.2%) - (-0.8%) =-5.4%!

Conclusion & Takeaway Lesson

On average, young, married women’s real wages dropped by 5.4% in response to childbirth coverage mandate.
Is this a beneficial change or not? Only the individual can answer for it. For some, it might not be a big drop. But if you’re a young woman who doesn’t plan on having many kids, you might not be so pleased.
Is this result bulletproof? Certainly not. But the bar for any alternative explanation is now extremely high. Any alternative explanation would need to identify an X-factor that makes young, married women’s wage drop by 5.4% right after the childbirth mandate pass in a state, but not in any other time period, or any other population.
Takeaway lesson: when trying to identify the causal effect of an event, it’s not enough to simply compare the before-and-after. By clever de-meaning. you protect yourself from grossly under- or over-estimating the causal effect. Remember, correlation != causation!

Appendix: Regression Specification

If you’d like to implement this regression on your own, here’s the regression specification. Do not hesitate to leave me a message if you find any bugs in my reasoning!