The world’s leading publication for data science, AI, and ML professionals.

What Is Causal Inference?

A beginner's guide to causal inference methods: randomized controlled trials, difference-in-differences, synthetic control, and A/B testing

Photo by Delano Ramdas on Unsplash
Photo by Delano Ramdas on Unsplash

This article is intended for beginners who want a comprehensive introduction to causality and Causal Inference methods, explained with minimal math.


When it comes to causality, we simply can’t avoid this classic statement: "Correlation does not imply causation." And a classic example is that just because ice cream sales and drowning incidents are correlated, one does not cause the other. You’ve probably heard many such examples illustrating the difference between the two. While these examples are often straightforward, the distinction can become blurred in actual analyses.

Without a clear understanding of how causality is measured, it is easy to make incorrect causal inferences. In this regard, one question I often encounter is, "Yes, we know that correlation does not mean causation, but what about a regression analysis?". The short answer is that linear regression, by default, does not provide any causal statements unless we undertake appropriate steps – this is where causal inference methods come into play.

Causal Inference is a scientific process that measures the cause-and-effect relationships between variables.

Social Science & Medical Research

In social science and medical research, causal inference is widely adopted due to the nature of their studies. Researchers aim to identify the underlying factors that trigger the outcome because understanding these factors can inform policy-makers in developing effective policies. Example of causal questions in these fields can be –

  • Does the new economic policy lead to a positive change in employment outcomes? What is the impact of the job training program on employment rates?
  • Can the change in patients’ health be attributed to the new drug? What is the true effect of the drug?

Business

In the business world, causal inference is not as popular as in social science, except among large organizations and technology companies. (A/B testing is an exception!) However, understanding causality can benefit business decision-making.

Businesses can benefit from a balanced approach that includes understanding past data and trends (descriptive), predicting trends (AI/ predictive analytics) and understanding why things work/don’t work (causal analysis).

An obvious example would be predicting customer churn rates. If we can’t establish the cause behind customers leaving, predicting the number of customers churning alone is not effective. Descriptive and predictive analysis of churning rates can provide insights into how to improve retention rates, but causal inference can help us determine the true effectiveness of these improvements.

  • What is the impact of the new customer retention initiative? How many customers are we able to prevent from leaving due to this initiative?

However, not every business opts for causal analysis because running causal inference requires a robust experimental setup, which can be costly and often not operationally or technologically feasible. This brings me to another topic: the measurement of causality.


How is Causality Measured?

The core principle of causal inference is an experimentation. In an experimental study, we typically compare the results of two groups of participants: the treatment group and the control group. The treatment group receives the intervention, while the control group does not. In an ideal scenario, the two groups should exactly match in their characteristics (e.g. age, gender), except for their exposure to the intervention. It is also crucial to ensure there is no contamination (spillover effects) between the groups. Only then, if there is a change in the outcome in the treatment group, can we confirm that the observed difference is solely caused by the intervention.

In this article, I will be discussing the most common causal inference methods in the industry. Additionally, I will briefly touch upon the different variations of regression analysis to demonstrate why a simple regression alone is not sufficient to infer causal effects. By exploring these variations, hopefully, readers can see the distinctions and understand why certain implications are necessary to conduct a causal analysis.

Randomized Controlled Trials (RCTs)

The gold standard in causal inference is Randomized Controlled Trials (RCT) which is a process that measures the effectiveness of the intervention by controlling the variables through an experimental study.

The key concept in RCTs is randomization, which reduces systematic differences or bias by randomly assigning participants to treatment and control groups. For instance, if you allow people to choose whether to use a new health app, those who opt-in might already be more health-conscious or motivated, skewing the results. In contrast, the RCT method gives all targeted participants an equal chance to be in either group through randomizing.

Example

As an example, consider launching a training program aimed at improving student academic performance. Students are randomly selected to enroll in the program, creating two groups: those who receive the program (treatment) and those who do not (control). The students in the control group should mirror those in the treatment group in terms of age, geography, and educational background. In technical terms, all the observable confounding variables are controlled – the only difference left between the two groups is the exposure to the program.

Randomized Controlled Trials (Image by Author)
Randomized Controlled Trials (Image by Author)

To quantify the effect, we take the average score of each group and compare the two means using a statistical test ( an independent sample t-test ). The idea is "how confident are we that the difference between the two groups is not due to chance?". This is the RCT analysis in its simplest form, where the RCT is conducted at a single point in time (cross-sectional). Some RCTs are conducted as longitudinal studies, meaning the data is collected repeatedly over time from the same subjects.

Basic regression models for RCT can be written as follows:

RCT regression equations (Image by Author)
RCT regression equations (Image by Author)

Here, the main difference between cross-sectional and longitudinal RCTs is that in longitudinal RCTs, we need to adjust for baseline differences which is Yᵢ₀ (outcome variable measured at baseline). The "Treatmentᵢ" refers to dummy variables (Treatment = 1, Control = 0). These equations do not include control variables like age and gender, which you can add if necessary.

In this context, the coefficient β₁ represents the causal effect that we are looking for. Its magnitude and statistical significance gives us insights into how effective the training program is.

Limitations

Nevertheless, as much as RCTs are rigorous and robust, the randomization process is challenging to implement in many cases. For instance, in a business context, estimating the effectiveness of a product through RCT can be difficult because we can’t control which customers will buy the product. This poses ethical concerns and also involves financial and operational constraints.

Difference-in-Differences (DiD)

Difference-in-Differences (DiD) is an alternative method (quasi-experimental method) used by researchers and data scientists when randomization is not feasible. This method estimates causal relationships by leveraging a double comparison approach.

DiD method compares both the treatment and control groups as well as the before-and-after periods, helping control for pre-existing differences between the groups before exposure to an intervention.

Example

To illustrate, suppose you want to measure the effectiveness of a product designed to improve farm productivity. In the pre-treatment period, neither the treatment group nor the control group has access to the product. In the post-treatment period, the treatment group applies the product to their farms, while the control group does not. The difference in outcomes between the two groups can be attributed to the product’s impact.

Difference-in-Differences Method - Parallel Trends Assumption (Image by Author)
Difference-in-Differences Method – Parallel Trends Assumption (Image by Author)

But this method relies on this strong assumption called the Parallel Trends Assumption. This assumption states that, in the absence of the product, the differences in farming outcomes between the two groups of farmers would remain constant over time. For this assumption to be valid, both groups need to follow parallel trends prior to the intervention.

The following is a simple regression model used for DiD:

DiD Regression (Image by Author)
DiD Regression (Image by Author)

The Treatment and Post variables refer to dummy variables where (Treatment = 1, Control = 0) and (Post-treatment = 1, Pre-treatment = 0).

Here, the coefficient β3 (double-differenced) of the interaction term between treatment and the time period represents the causal effect.

Limitations

DiD is a powerful technique but if the Parallel Trends Assumption is violated, the results can be biased.

To enhance the robustness of DiD estimates, we can use extensions like:

  • Sensitivity Analysis: Testing the sensitivity of results to different model specifications.
  • Matching Techniques: Combining DiD with matching methods to ensure more comparable groups.

Synthetic Control

Synthetic Control (SC) is another common method that is gaining popularity in the recent years. This method is suitable:

  • When there is only a single treated unit (or a small number of treated units). This means that the intervention is applied on a large scale at an aggregated level, such as a region, state, or country. As a result, it is not feasible to find an exact comparison group that matches the treated units. Let’s say a new environmental policy implemented in California to reduce air pollution. Since the policy is applied across the entire state, it is difficult and costly to find another state that exactly matches California in all relevant pre-intervention characteristics.
  • When random assignment is not feasible but data are available for multiple periods.

As the name suggests, Synthetic Control involves creating a "synthetic" control group by combining multiple control units in such a way that their weighted combination closely matches the characteristics of the treated units before the intervention. This synthetic pre-intervention data is then used to predict a hypothetical post-intervention trend, which is compared against the actual treatment trend.

Example

Consider a scenario where we want to measure the impact of a marketing campaign from region A on our revenue. The campaign has already been run and it is not a matter of testing which campaign would be better. Instead, we want to estimate – if we didn’t run this campaign, what would have happened to our sales.

Synthetic Control Example (Image by Author)
Synthetic Control Example (Image by Author)

Just as RCT and DiD, the standard SC method is also based on a linear regression model to predict the hypothetical post-intervention data. However, in SC, the data is treated as time series data.

Typical steps involved are:

  1. Identify Similar Units (Donor Pool): Let’s say, we choose 3 regions (Region B, Region C, Region D) that are very similar to the Region A in terms of seasonality, social demographic factors.
  2. Weighting Matrix: Next step is a dual weighting process. As this relies on a weighted average, the weights must be non-negative and sum to 1. (Don’t worry! You don’t need to find the weights manually. Software will handle the hard bit)

    • Unit weights : This step involves finding the optimal weights for the control units (e.g., 0.3 Region B, 0.2 Region C, 0.5 Region D) so that their weighted combination best matches the treated unit (e.g., Region A) in terms of the outcome variable before the intervention.
    • Feature Weights : This step involves finding the optimal weights for the covariates (features) so that the synthetic control constructed in the first step also closely matches the treated unit in terms of these covariates.
  • Regression Model: Use regression to train the model and predict the hypothetical trend for the synthetic control group.

A/B Testing

Lastly, A/B testing, which is not often seen in causal inference discussions, deserves attention as well!

If you are a digital marketer or a UX designer, you are probably familiar with A/B testing (aka split testing). Technically, A/B testing is a form of RCT in a digital context and it is used to validate whether a product or a marketing/ad campaign improves certain KPIs. For example, A/B testing can be used when you want to compare two versions of an ad creative to see which one drives better ROI.

A/B Testing Example (Image by Author)
A/B Testing Example (Image by Author)

Main differences between A/B testing and RCT

A/B testing (online RCT) and offline RCTs share theoretical similarities. However, their implementation and practical considerations differ a lot.

  • In A/B testing, outcome data can be gathered almost immediately after users interact with different product variations whereas RCTs can take months or even years to get the data (e.g.- agriculture products, economic policies).
  • Randomization in A/B testing is typically less resource-intensive compared to RCTs, which are logistically complex and time-consuming.
  • A/B testing offers flexibility and quick analysis results through various tools and software (e.g. Facebook Ads), whereas RCTs are more rigid in design.
  • A/B testing is focused on the aspect of "experimenting". That’s why the A/B tests are often run iteratively until we find the "best" outcome ( e.g. A vs B – which is better? B vs C which is better?)

Thanks to the availability of tools that make A/B testing much easier, professionals from various fields can now run A/B tests at ease without the need for a deep understanding of the underlying theoretical concepts. As a result, the term "A/B testing" loses some of its scientific rigor.

Key concepts of RCT are applied in A/B testing

  • Hypothesis: A clear hypothesis is essential in both RCTs and A/B testing. It helps track your process and measure outcomes effectively (e.g., "Changing the CTA text copy will increase conversion rates").
  • Control Variables: Similar to RCTs, A/B testing requires controlling variables to isolate the impact of the tested element (intervention). That’s why it is crucial to test only one element at a time (e.g., color, placement, or copy) to determine what actually contributes to changes in the outcome. Testing multiple elements simultaneously can obscure the results.
  • Sample Size and Statistical Significance: Both RCTs and A/B tests require sufficient sample sizes and statistical rigor. Statistically significant results ensure that observed effects are not due to random noise or external factors, like seasonality.

Understanding the core concepts of RCTs can help us interpret results from A/B tests meaningfully and design more effective A/B tests.


Conclusion

In this article, we examined popular techniques used in the industry for causal inference. We discussed overarching concepts and the use cases of different methodologies (with minimal statistical complexity!). While this article did not provide detailed, step-by-step instructions for each analysis method (which I plan to cover in future articles), I hope it serves as a comprehensive guide to understanding causal inference techniques, especially for those who just get started in this field!


References

  1. Twisk, J., Bosman, L., Hoekstra, T., Rijnhart, J., Welten, M., & Heymans, M. (2018). Different ways to estimate treatment effects in randomised controlled trials. Contemporary clinical trials communications, 10, 80–85.
  2. Pickett, R. E., Hill, J., & Cowan, S. K. (2022). The Myths of Synthetic Control: Recommendations for Practice.
  3. Causal Inference: A Simple Difference-in-Differences Model by Mike Jonas Econometrics (Youtuble Channel)
  4. Bowne-Anderson, H., Loukides, M. (2022). What is Causal Inference? An Introduction for Data Scientists.
  5. Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of economic literature, 59(2), 391–425.

Thank you for reading!

Disclaimer: While I’ve tried my best to unpack the theories accurately, I may still make some mistakes. Please feel free to let me know if you spot any errors.


Related Articles