A Quick Comparison of Causal-Inference Estimates

There are many ways to estimate the magnitude of a causal effect, and here we use a minimally complex causal structure to check if those estimates agree.

Rumen Iliev

Published in

Towards Data Science

10 min readApr 28, 2020

source: Fascinadora via shutterstock (SL)

1. Introduction

As an experimental behavioral scientist, I always thought that understanding the causal directionality of statistical relationships is at the heart of empirical science. I was trained in classical experimental design, where the researcher is assumed to have full control over the environment and whose main worry is how to position different experimental conditions in time or space (e.g. Latin Square Design). Once you leave the safety of the controlled lab experiments, however, inferring causality becomes a major problem which easily jeopardizes the internal validity of your conclusions. Luckily, in the last few decades, there has been tremendous progress in research on statistical causality, both in theory and methods, and now causal inference is becoming a rather common tool in the toolbox of a data scientist. To catch up with current methods I did a quick review and I was somewhat surprised by the plethora of ways for estimating causal effects. In this project I will list the most common methods I found in the literature, apply them to a simplified causal problem, and compare the observed estimates. This comparison is intended as a brief high-level overview and not as a tutorial on causal inferences. For a brief introduction on the topic I recommend Pearl et al. (2016), and for an in-depth coverage an interested reader can check Pearl (2009), Morgan and Winship (2015) or Prof. Jason Roy’s online class (Roy, 2020).

2. An Example Causal Model

When using statistical methods to infer causality, typically we are interested in the magnitude of the effect of cause X on an outcome Y. When we are only observing those variables, or if there are challenges with the randomization (e.g. selection bias), we will typically need to account for a broader set of variables. In Figure 1 I present a causal graph for a hypothetical example. The example includes the three main types of additional variables which help us to get an unbiased estimate: backdoor, front door and instrument variables.

Figure 1. A hypothetical graphical causal model of cause X influencing outcome Y in the presence of other variables. The causal effect of X on Y can be estimated if we measure any of these three sets: {X, Y, BD}, {X, Y, IV}, or {X, Y, FD}

Suppose that X is a binary variable indicating the effect of exercising at least weekly (x = 1 if exercising; x = 0 otherwise) and Y is life expectancy measured on a continuous scale. The effect of X on Y is fully mediated by a variable FD (front door criterion), which in our example might be a body mass index. Further, both Y and X are influenced by variable BD (back door criterion), which in our case could be some set of genetic factors, which do not affect FD directly. Last, X is also influenced by IV (instrumental variables), which for our illustration could be proximity to a sport facility.

We can instantiate this causal graph with the following model:

IV = U

BD = U

X = { 0, if IV — BD + U <= 0

1, if IV — BD + U > 0}

FD = X + U

Y = 65 + FD + BD + U

Model 1: Structural Causal Model instantiating the graphical model in Figure 1. The U components are normally distributed error terms (or inputs from exogenous variables) with mean 0 and SD = 2.

For the purpose of the current simulation, I randomly generated 10,000 instances from Model 1 and in the next part, I will estimate the causal effect of X on Y using different statistical approaches.

Here is the relevant code:

generate_df <- function(n = 1000, path_weights) {
 #generate data frame from path_weights
 a <- path_weights
 sd_noise = 2
 iv <- rnorm(n, sd = sd_noise)
 bd <- rnorm(n, sd = sd_noise)
 x <- rnorm(n) + a[1]*iv + a[2]*bd
 x <- ifelse(x<=median(x), 0, 1)
 lm(x~iv + bd) %>% summary
 fd <- (rnorm(n, sd = sd_noise) + a[3]*x ) 
 lm(fd ~ x) %>% summary()
 y <- (65 + rnorm(n, sd = sd_noise) + a[4]*fd + a[5]*bd ) 
 lm(y ~ fd + bd) %>% summary()
 id <- rnorm(n)
 df <- data.frame(
  id = id, x = x, y = y, iv = iv, fd = fd, bd = bd)
 return(df)
}
path_weights <- c(1,-1,1,1,1)
df <- generate_df(n = 10000, path_weights)

3. Naive estimate (Observing X and Y)

A naive estimate of the causal effect of X on Y could simply be obtained from the regression coefficient of X predicting Y. In this example, the causal effect is b_naive = -0.97, surprisingly suggesting that exercise shortens life expectancy by almost one year.

naive_fit <- lm(y~x, data = df)
summary(naive_fit)

A naive estimate is useful when a researcher is convinced that there are no BD variables which need to be accounted for (e.g. when X is randomly assigned or as-if randomly assigned). With most observational studies and quasi-experimental designs, however, naive estimates are often not very useful.

4. Back-door variable adjustment (Observing X, Y and BD)

If in addition to X and Y you can also measure BD, you can compute an unbiased estimate of the causal effect of X on Y, avoiding the problems that naive estimates have. Here BD is just a single variable, but it could be a set of variables which satisfy the back-door criteria. When reviewing the literature I found the following five methods to estimate causal effects while adjusting for backdoor variables:

4.1 Covariates

One way to estimate the causal effect of X on Y is to run a regression model, predicting Y from X and including BD as a covariate. For most researchers who have training in linear regression but not in causal inferences, this is often the most intuitive approach.

covariates_fit <- lm(y~x+bd, data = df)
summary(covariates_fit)

Applying this method to the data from our simulation we find that the causal effect of X on Y is b_covariates = 1.01. Based on this analysis, if you exercise you will live one year longer. Notice that this estimate is not only different than the naive estimate, the two estimates actually have opposite signs and lead to conflicting conclusions (check Simpson’s Paradox)

4.2 Direct Matching

The covariates adjustment from above can also be accomplished by directly matching treatment and control participants on their BD scores. The main idea is that the matching procedure will remove the influence of BD on the causal estimate by only comparing control and treatment subjects who are already similar on their BD scores:

greedymatch <- Matching::Match(Tr = df$x, M = 1, X = df[,”bd”])
matched <- df[unlist(greedymatch[c(“index.treated”, “index.control”)]),]
t.test(matched$y[matched$x == 1],matched$y[matched$x == 0], paired = TRUE)

When using this method we get b_direct_match = 1.05, which is very close to what we observed when using the covariates method.

4.3 Propensity Score Matching

If you have multiple BD variables you need to account for, it might be very challenging to find good matches (check the curse of dimensionality). Instead, you can use propensity score matching, where you first compute the probability for being in the treatment group (a.k.a propensity score, check this paper or this blog post), and then match participants based on those probabilities.

ps_fit <- glm(x~bd, data = df, family = “binomial”)
df$ps_score <- ps_fit$fitted.values
logit <- stats::qlogis # rename the function for clarity
greedymatch <- Match(Tr = df$x, M = 1, X = logit(df[,”ps_score”]), caliper = .2)
matched <- df[unlist(greedymatch[c(“index.treated”, “index.control”)]),]
t.test(matched$y[matched$x == 1],matched$y[matched$x == 0], paired = TRUE)

The causal effect is estimated to be b_ps_match = 1.05 which is virtually the same as the previous adjusting methods.

4.4 Inverse Probability of Treatment Weighting

You can also use weighted regression, where the weights are based on the probability of treatment. Control participants with high propensity scores and treatment participants with low propensity scores receive higher weights, adjusting for the treatment/control imbalance due to the BD variable(s).

df$weight <- ifelse(df$x == 1, 1/df$ps_score, 1/(1-df$ps_score))
iptw_fit <- lm(y ~ x, data = df, weights = weight)
summary(iptw_fit)

The estimate for the simulated data was b_iptw = 0.92, very close to the previous estimates.

4.5 Doubly-Robust Estimates

This is a more advanced method which allows more room for misspecification of the model. To get an unbiased estimate of the causal effect it is enough to correctly specify either the propensity score model or the outcome regression model.

dr_fit <- drgee::drgee(oformula=y~bd,
 eformula=x~bd,
 iaformula=~bd, olink=”identity”, elink=”identity”,
 estimation.method=”dr”,
 data=df)
summary(dr_fit)

The causal effect is b_doubly_robust = 1.01, again very similar to the previous other adjustment methods.

5. Instrumental variable (Observing X, Y and IV)

What if you know that BD exists, but you cannot measure it? If you can measure variable IV instead, you still can estimate the causal effect of X on Y. This approach is known as the Instrumental Variable method, where the effect of the instrument IV on Y is mediated by X and can be used to estimate the effect of X on Y. A common method to run this analysis is called two-stage regression, where at the first stage we regress X on the instrument IV, and on the second stage, we regress the outcome Y on the residuals from the first stage.

stage1_fit <- stats::lm(x~iv, data = df)
df$stage1_predict <- predict(stage1_fit, type = “response”)
stage2_fit <- stats::lm(y~stage1_predict, data = df)
summary(stage2_fit)

In our example, the instrumental variable method estimates the causal effect of X on Y to be b_iv = 0.96, very close to the BD adjustment methods.

6. Front-door adjustment (Observing X, Y and FD)

Even if we can measure neither IV nor BD, it is still possible to compute an unbiased estimate of the causal effect of X on Y. The front-door adjustment allows us to achieve this by measuring the effect of X on FD and FD on Y.

x_fd_fit <- lm(fd~x, data = df)
fd_y_fit <- lm(y~fd + x, data = df) # here x is included to block back-door effects
x_fd_fit$coefficients[2]*fd_y_fit$coefficients[2]

Using front-door adjustment we estimate the causal effect of X on Y to be b_fd = 1.03.

7. Cross-method agreement

While I understand why some of the methods should return equivalent or very close estimates, I still find it both striking and somewhat perplexing that the causal effect of X and Y can be estimated in so many ways. To examine the agreement of the different methods I ran a series of simulations based on the causal graph from Figure 1. I used the same types of relations as the ones outlined in Model 1, but for each simulation, I randomly assigned a random regression coefficient, with absolute values ranging from 0.3 to 3. For reference, for the weaker relationship (coefficients set to 0.3) FD and BD together were explaining 8% of the variance in Y, and the stronger relationship (coefficients set to 3) they were explaining 68% of the variance (based on R²).

Figure 2. Correlations between the causal effect estimates from the eight causal inference methods discussed here.

I ran 2000 simulations, with 2000 rows each. For each simulation I computed eight different estimates of the causal effect of X on Y, using the methods listed above. Figure 2 depicts the agreement between the different methods. As can be seen in the figure, there is substantial agreement between the methods, with Pearson’s correlations well above 0.9. The naive estimate is also positively correlated with the other methods, yet it often underestimates or overestimates the true causal effect.

8. Caveats and further observations

The goal of the current project was to provide a quick overarching example for the main methods for estimating causal effects and to demonstrate that those methods largely agree in their results. In its brevity, however, this example brushed over many important details, some of which I will need to mention explicitly:

Here I treated causal effects as an unitary concept, yet there are different types of causal effects. Instrumental variables, for example, estimate Local Average Treatment Effect, while the other methods, as applied here, estimate (Marginal) Average Treatment Effect. Those two effects are not necessarily equal.
While the causal graph in Figure 1 and the subsequent model are based on various theoretical principles and assumptions, I only provided external references and did not discuss them here.
The different methods here largely agreed in their estimates, but this is partially due to the simple model I used to generate the data. For example, I did not include heterogeneous treatment effects, unbalanced groups or skewed distributions, all of which would have decreased the consensus across methods.
Matching methods are sensitive to the particular algorithms used and to the strength of the relationship between X and BD.
While experimenting with the magnitude of the causal relations, I noticed that weaker causal effects are associated with a substantial drop in agreement across methods. Instrumental variables estimates were particularly affected by the weak statistical relationships (check Bound, Jaeger, & Baker, 1995)

9. Conclusion

I found catching up with the field of causal inferences to be both challenging and exciting. To a novice, the field can seem fragmented, inconsistent and often focused on abstract theories rather than on applications. Yet beyond these initial impressions, it was fascinating to discover that both theories and methods gradually converge and provide researchers with a plethora of creative tools to use in their search for causality. Putting the most common of these tools together helped me to see the convergence of theories and methods and I hope it might be of help to other fellow researchers and data scientists.

References

Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research, 46(3), 399–424.

Bellemare, M. F., & Bloem, J. R. (2019). The Paper of How: Estimating Treatment Effects Using the Front-Door Criterion.

Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. Journal of the American statistical association, 90(430), 443–450.

Morgan, S. L., & Winship, C. (2015). Counterfactuals and causal inference. Cambridge University Press.

Pearl, J. (1993). Bayesian analysis in expert systems: comment: graphical models, causality and intervention. Statistical Science, 8(3), 266–269.

Pearl, J. (2009). Causality. Cambridge university press.

Pearl, J., Glymour, M., & Jewell, N. P. (2016). Causal inference in statistics: A primer. John Wiley & Sons.

Roy, J. (retrieved 2020, April 23) A Crash Course in Causality: Inferring Causal Effects from Observational Data. coursera.org, url: https://www.coursera.org/learn/crash-course-in-causality

A Quick Comparison of Causal-Inference Estimates

There are many ways to estimate the magnitude of a causal effect, and here we use a minimally complex causal structure to check if those estimates agree.

Written by Rumen Iliev