
Understanding Causal Inference mechanisms is increasingly important in the modern world where we have tons of fake "discoveries". Media speculates about the effectiveness of different medicines, diets and ways of living. The problem is that there are decent studies on many topics out there but people usually do not understand them as they are written in a tedious language meant for other scientists. I hope this article will help you learn and understand some of the basic concepts of causal inference to better understand scientific studies out there and not be misled by fake "discoveries" in the media.
I will split this article into two parts, as the topic is pretty wide. This is the first part, where we will discuss the main paradigms, metrics and fundamental questions of causal inference. In the second part, we will look at some more advanced concepts, built on the ideas we discuss in this part. You can find the second part here.
The whole discussion starts from the question: ok, what is it all about? What this science is about? Here we are not talking about predictions, classifications and other problems, which are popular among data scientists nowadays. We are talking about experiments and observational studies, where the main goal is to find out the effect of some treatment applied to a sample or a population. You can think of that as medical trials: we have two groups, participants of the first one are taking new medicine (treatment group), participants of the other one are taking placebo (control group). Our goal is to understand what is the effect of the new medicine and measure it. However, as we will see later in this article some of the concepts which are quite popular in Data Science such as linear and logistic regressions are quite useful in solving the fundamental problem of causal inference.
"Classical paradigm" and "Potential outcomes framework"
There are two paradigms that you can use to analyse data from studies: The classical paradigm looks at the associations between potential cause (e.g. people smoking) and observed effect (e.g. lung cancer) and tries to find out if that relationship is causal. The main question asked here is: "what should we know to find out that smoking causes lung cancer?"
The potential outcomes framework looks at the same problem from a different point of view. This framework says is that there are two (in simplified case) potential outcomes of every unit: potential outcome under treatment (Y(1)) and potential outcome without treatment or under control (Y(0)). The main question here is: "what would be the difference if a person would smoke vs. the person would not smoke?"
The classical paradigm is mostly based on association, claiming that the stronger association is the more probable it is Potential outcomes, in my opinion, is the preferable way of solving problems of causal inference even though it imputes some difficulties when we are talking about the real-world observational studies. We will talk about them later.
That being said potential outcomes framework claims that all the potential outcomes already exist and we can only choose which one we want to observe. That is displayed in the following Science Table that was introduced by Donal Rubin in Rubin (1978), "Bayesian Inference for Causal Effects: The Role of Randomization".

The value of such tables is also nicely described by D. Rubin himself: "The explicit representation of all potentially observable values leads to substantial notation, but once established, the notation permits important conclusions to be drawn almost immediately".
Basically, the Science Table describes the whole dataset collected in the following way:
- Unit: indexes of units or unit groups (that are used for the bigger studies)
- Covariates: these are all the pre-treatment parameters, which we can observe before the treatment and can use to adjust our study design (on the picture above there is only one covariate shown for simplicity)
- Treatment: indicates which units were treated (1) and which were in a control group (0)
- Potential outcomes: Y(1) is the vector of potential outcomes for every unit if they were treated, Y (0) – if they were in a control group. In practice, we don’t know both values
- Observed: that is the actual value we observe during the experiment. It equals one of the potential outcomes, depending on if they were treated or were under control.
Сausal estimands and the fundamental problem of causal inference
So as we know how to describe data gathered from a study, it’s time to calculate some metrics. The goal of causal inference is to calculate treatment effects. Simply saying – we want to know how big an effect of a treatment on a population/sample/subgroup. This helps us to understand fundamental causal relationships and make decisions based on this understanding.
For example, these estimates are used during Clinical Trials to find out if a new drug is effective or not. Based on that authorities decide whether allow this drug to the market or not.
So, those metrics are all quite similar and they just differ by the object on which they are measured (population/sample/subgroup). All of them are quite simple to calculate, basically for each unit the causal effect of treatment is:

However, it becomes more complicated with the fundamental problem of the causal inference and especially potential outcomes framework is that we do not know BOTH potential outcomes for the same unit. We know only one of them – the observed one.
There are basically two possible solutions to the problem:
- The scientific solution. Here we exploit our prior knowledge of the setting to infer the other potential outcome (the one we do not observe). We cal also implement some assumptions that make causal inference legit such as:
- Temporal stability – the value of Y(0) does not depend on when the unit is put in the control group. Basically, the unit is stable in time.
- Causal transience – Y(1) is not affected by the fact that the unit was prior under control (Y(0)). So that makes it possible to switch units between treatment and control groups.
- Unit homogeneity – the values of Yi(1) = Yi'(1) and Yi(0) = Yi'(0). That means that the expected values of all the potential outcomes are the same for the units with the same explanatory variables (covariates).
- The statistical solution. When the other potential outcome is not self-evident even with those assumptions, we use those "cheating" statistical methods, which are based on the following technique: make use of the population. The same units can be under different treatments at different points of time OR different units can be under different treatments at the same time. Using these we can do the following tricks:
- Find a close substitute to a unit in the other treatment group and use it if it was equal to the one we are comparing to in terms of covariates. Then we can just calculate individual treatment effects using Formula 1.
- Estimate average causal effect that averages over units in the sample or population.

Here are the main metrics to calculate the average treatment effect:
- Sample Average Treatment Effect (SATE):

- Conditional Average Treatment Effect (CATE). It is essentially the same as SATE but with the conditioning on covariates of the units. Conditioning on them makes the subgroup n_x:

- Population Average Treatment Effect (PATE). Also same thing but for the whole population N:

And here it comes. One of the essential problems of the causal inference is to calculate those average treatment effects in different settings, with different limitations, under different distributions of untis but with the main problem – we do not know both potential outcomes for the same untis.
One of the ways to estimate average treatment effects is to use regressions. It turns out that we can build a regression in such a way that the estimate of one of the coefficients will equal the estimate of the treatment effect. Moreover, it turns out to be a nice way to adjust for the covariates, when there are obvious groups of units (such as sex, age, geographical clusters, different occupations etc.), mixing of which will impute some extra bias to the treatment effect estimation. We will discuss these techniques in the second part of the article.
The last thing I would like to highlight in this part of the article is also a fundamental assumption that is necessary to calculate those effects and is useful in many other ways. It is called Stable Unit Treatment Value Assumption (SUTVA). It consists of two parts:
-
An assumption of "no interference". Simply put that means that potential outcomes of any unit in a study are not affected by treatments applied to other units in this study. An example of a situation when this assumption does not hold: one of the partners in a couple got vaccination and that decision can influence the decision of the other partner.
-
An assumption of "no multiple versions of treatment". Basically, it is saying that no matter how a unit gets a treatment, a potential outcome Y(1) will be the same. An example of a situation when this assumption does not hold: when a person gets a drug in a form of injection instead of a pill he/she will get the faster and stronger effect as the drug gets straight into the bloodstream.
To sum up the first part of our dive into causal inference, I would like to encourage you to spend some time diving deeper into this subject by yourself. Causal inference lies in the fundament of all modern discoveries including vaccines (and understanding clinical trials results of them became so important in 2020). This science only seems trivial at the first glance and to assume that was my mistake at the beginning of its discovery as well. Causal inference can open your eyes to the world of scientific research and will help you to design your own experiments well.
Continue exploring by reading Part 2 here!