Confounders made simple

Published in

Towards Data Science

10 min readAug 23, 2019

ABSTRACT: Not all covariates of treatment and outcome variables in an observational study should be adjusted for. By default, one should doubt studies which blindly adjust for many confounders without justifying their choice on causal grounds.

DISCLAIMER: My knowledge of causal inference is limited enough that I could be saying things that are very wrong. Reach out to me on twitter @jsevillamol if you find a mistake!

The problem of confounders

Suppose that you want to determine the causal effect of a treatment on an outcome. First order of business is determining whether there is a statistical correlation between them.

Albeit still challenging, we have good statistical tools to determine networks of statistical association between complex sets of variables.

However, correlation is not causation — a correlation might be caused by a confounder, a causal antecedent of both treatment and outcome.

For example, the treatment might be smoking, the outcome might be respiratory disease, and a plausible confounder is age; people who are older smoke more often AND are are more prone to respiratory disease.

We can illustrate this situation with a causal diagram:

We say that there is an unblocked backdoor path from the treatment to the outcome via age, ie smoking <= age => respiratory disease.

Ideally we would want to run a randomized controlled trial (RCT) that randomly assigns the treatment so we can divert the backdoor path.

A randomized controlled trial (RCT) of a smoking study

But this is not always possible; for instance, the treatment might be unethical, or we may want to draw conclusions from historical data. What should we do in those situations?

How not to adjust for confounders

An alternate way of blocking the spurious influence of the confounder is adjusting through for example stratification. In the smoking example, we might divide our data in youngsters and olduns, study the correlation between smoking and disease in each group and then report the weighted correlation as an estimation of the causal effect.

This would work well if we are confident that the covariate is indeed a confounder, or causal ancestor of both the treatment and the outcome — since within each studied group the confounder variable is fixed, it can no longer mediate a spurious influence on the treatment and outcome, and we will be able to make assertions about the true causal effect of the treatment.

So whenever researchers identify a variable that correlates with both treatment and outcome, they tend to adjust for it.

But that is not the only possible causal relationship between the three variables!

Possible causal relations between treatment X, outcome Y and covariate Z

It could happen that the covariate mediates the interaction between treatment and outcome. That is, X => Z and Z => Y.

For example, we could be studying the effect of GMO crops on consumer health, and we find out that GMOs are less likely to be infected with a pathogen. In that case, the presence of a pathogen would be a mediator between the GMOs and consumer health.

Note that the mediator does not have to be the sole mechanism explaining the effect — the GMO might also change the dietary profile of the crop independently of the effect it has on pathogens.

In this case, adjusting for the covariate Z will reduce the apparent effect of the treatment X on the outcome Y, and our report will be misleading (unless we were specifically trying to measure in isolation the part of the treatment’s effect not mediated by the covariate).

The third possibility is that the covariate is a collider of treatment and outcome. That is, both X and Y cause Z. For example, we could have that both artificial intelligence researchers and chess affitionates like to read developments on automated chess playing.

Adjusting for a collider will increase the apparent strength of the effect of the treatment in the outcome.

In the previous example, if we surveyed the people who have read an automatic chess playing article, we may find that chess affitionates are less likely to be AI researchers and viceversa — but that would not be surprising, since we are filtering out of our survey demography the people who are neither AI researchers nor chess affitionaties.

So beware adjusting for mediators and colliders!

Now, how do we distinguish between the cases where a covariate is a confounder from the cases from the cases where it is a mediator or collider?

Short answer: we cannot, at least not from just from observing the data. We need to rely on domain specific knowledge of the underlying causal relationships.

When multiple covariates are involved, the story gets more complicated. We’d need to map out the whole causal graph between all the covariates, the treatment and the outcome, and justify our causal mapping on scientific grounds.

Then we can use the rules of the do-calculus and principles such as the backdoor criterion to find a set of covariates to adjust for to block the spurious correlation between treatment and outcome so we can estimate the true causal effect.

In general, I would expect that the more variables a study adjusts for, the more likely that they are introducing a spurious correlation via a collider or blocking a mediation path.

The problem of degrees of freedom

A separate strong reason why we should doubt studies that adjust for many variables in an unprincipled way is the addition of degrees of freedom on how to perform the study.

If you measure a relation between two variables in 1000 different ways and pick the one that shows the greatest correlation, you are likely to overestimate the effectiveness of the treatment.

Having a greater set of covariables allows you to adjust for any subset you please. For example, if you have access to 10 covariates you can adjust for any of 2^10 ≈ 1000 possible subsets.

It does not have to be that a single research group is systematically trying all possible adjust subsets and picking the best one (although notably some statistical methods are doing something pretty similar to this — e.g. stepwise or best subset methods of variable selection). It could be that different researchers are trying different subsets, and the mechanism that combines their results is biased.

For example, 100 research groups might try 100 different subsets. 95 of them correctly identify that there is no effect, but because of publication bias they don’t make their results widely available, while the 5 groups that mistakenly identified a strong effect arethe only one that get published, creating the impression that all studies performed found a strong effect where in fact there is none.

In summary, when you do not precommit to following a principled way of performing adjustment in your study, you are more likely to introduce a bias in your results.

A word of caution: you still need good controls

In this article we are focusing on the problem of choosing too many, unsuitable controls because that is an intuition that I see more people lack, even among those otherwise knowledgeable about applied statistics.

However be mindful that you can make the opposite mistake — you can fail to adjust for relevant confounders — and end up concluding that chocolate consumption causes nobel prizes.

Especially with observations on complex phenomena, only adjusting for a few things virtually guarantees you are omitting things you should be adjusting for — and you may either be over or understating the effect.

A related challenge goes under the heading of ‘residual confounding’. Even if you identify a confounder and adjust for it, it will still influence the results commensurate to how accurately you can measure it — naturally we measure most things inaccurately or by proxy.

So to recap in a sentence: controlling for confounders is key if you want to infer causal effects from observational data.

So what should we do?

As a litmus test, be more doubtful of observational studies that adjust for variables without justifying their choice of adjustment on causal grounds.

However, some studies do not do the necessary work to justify their choice of confounders, which leaves us in a much worse position to extract reliable data from their work. What can we do in those cases?

First of all, we can examine each of the chosen confounders in isolation, and think how they causally behave in relation to the treatment and outcome.

For example, suppose that we are reviewing a study of the effect of the Non Proliferation Treaty (X) on the level of investment in nuclear weapons (Y), and we are wondering whether they should have adjusted for GDP (Z).

Well, it could be the case that countries with higher GDP are also more influential and shaped the treaty to be beneficial for them, so Z => X. And countries with higher GDP can invest more in nuclear weapons, so Z => Y. In this case GDP would be a confounder, and we should adjust for it.

But we could tell an equally compelling story arguing that countries that sign the treaty are likely to be perceived as more cooperative and get better trade deals, so X => Z. And countries that invest more on nuclear weapons have better security so they attract more investors, so Y => Z. Under this interpretation GDP is a collider, and we should not adjust for it.

Or we could combine the two previous scenarios to argue that X=>Z and Z=>Y, so GDP would be a collider and we should not adjust for it either.

In the absence of a compelling reason to reject the alternate explanations, we should not adjust for GDP.

However imagine that the study is adjusting instead for participation in other nuclear agreements. It seems contrived to argue that participation in other treaties caused participation in the NPT; both seem to be more directly caused by the general predisposition of the country to sign nuclear treaties.

In this case “predisposition towards treaties” is a confounder for the effect of the NPT on Nuclear investment, but we cannot directly observe it. However we can block its spurious influence by adjusting for “Other nuclear treaties” per the backdoor criterion.

What happens if the study is adjusting for both GPD and participation in other nuclear treaties?

By default we should doubt the causal validity of their conclusion.

We might use this information to make some predictions (for example, we could use the results from the study above to guess whether a state that was going to sign the treaty anyway is going to reduce their investment in nuclear arsenal) but we cannot make treatment recommendations (for example, we cannot assert that lobbying a state actor into accept the NPT is an effective way to get them to reduce their arsenal).

If we want to try rescuing their results we can try building a causal diagram of relevant variables and considering if their choice of confounders satisfy the relevant criteria.

If the adjustment variables they chose do not block properly spurious effects or introduce new effects via colliders, and we have access to the data, we might want to try our hand at rerunning the study with a better choice of adjustment variables.

But of course we might still identify key confounders that the authors did not include in the data set. In that case, I suggest paying attention to John Tukey’s words:

“The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.”

Conclusions

In this post we have explained the three types of causal relationships between a covariate and a treatment-outcome pair: confounders, mediators and colliders. We have seen that to deduce causal effects we should adjust for confounders, but not for mediator or colliders.

We have argued that the more variables an observational study adjusts for, the more likely that either they will have made a causal error or that the additional degrees of freedom and publication bias exaggerate the reported effect.

We have also cautioned the reader against making the opposite mistake — adjusting for confounders in a principled way is essential to transform observational data into causal information.

As a way of extracting data from previous studies, we have suggested critically examining their choice of adjustment covariates based on causal criteria. If they adjust for unneeded variables, we have suggested rerunning the analysis if the data is available, whereas if a key confounder is missing in the data we should just accept that sometimes we do not have enough information to answer properly the questions we care about.

I want to thank Luisa Rodriguez for giving me an excuse to think about confounders and Gregory Lewis for reading an early draft and pointing out mistakes. Some clever quotes through the article I plagiarized directly from Gregory’s commentary; all mistakes are solely my fault.

For further reading on causal reasoning I recommend The book of why and A Crash Course on Good and Bad Control by Judea Pearl, as well as Why Correlation Usually ≠ Causation by Gwern.

This article was written by Jaime Sevilla, summer fellow at the Future of Humanity Institute of Oxford. Find me on twitter @jsevillamol.