Casual Causal Inference

What to expect from a causal inference business project: an executive’s guide II

Part II: Which are the project key points you need to know

Aleix Ruiz de Villa
Towards Data Science
4 min readSep 12, 2019

--

This is the second part of the post “What to expect from a causal inference business project: an executive’s guide”. You will find the third part here.

Causal Modeling

Casual inference models how variables affect each other. Based on this information, uses some calculation tools to answer questions like what would have happened if instead of doing this I had done that? can I have an estimate of the effect of a variable to another?

Causal inference provides a broad-brush approach to get preliminary estimates of causal effects. If you want more definitive conclusions, you should go, whenever is possible, for more precise and clear measurements with A/B tests. These do not suffer from confounding and you don’t need any modeling, beyond statistical calculations.

For modeling relationships with variables, you use a graph (see “Use causal graphs!”). This will be the foundation of your analysis. Such graph will be based on your human domain knowledge. You are modeling the process that creates your data and how variables affect each other. In the book “Inference and Intervention: Causal Models for Business Analysis” you can find in detail how to build this graph with many business examples.

Adding more variables

The advertising example from the previous post is a very simplified example. We should expect to have a larger set of relevant variables. For instance, in most cases, we will include seasonality. Culture and sports news usually have different months of activity. And it may be well possible that you used different platforms at different moments in time.

It’s important that you include all the variables that you think may be relevant to your analysis and argue why is it so. This is an exercise that takes some time. Here is a list of types of variables you may want to consider:

  • Affecting your causal variable
  • Affecting your effect variable
  • Contextual
  • Ones you can intervene or influence
  • Ones you don’t have data but are important, unobserved, missing or similar.

Main risk

What would happen if we knew that the topic is relevant, but for some reason, we didn’t have this information? In this case, the topic is an unobserved variable, as shown in the following graph.

Unobserved topic

Causal inference tells us that in this case, it is impossible to give a precise estimate of the effectiveness of the media platform. In reality, the lower is the effect of our confounder, the better estimate we have. So we shouldn’t stop here. We should look for other variables closely related to topic that would be enough to run the analysis. We are in fact approximating reality, as a physicist would do. The better the modeling process, the more realistic the results we will get.

The main risk in causal inference is missing relevant confounders in our analysis.

Causal Estimates

The last part is taking into account all the information represented in the graph to calculate causal estimates of each arrow we are interested in. We are not going to explain this part because we would need to enter into technical considerations. If you are interested in knowing more, you can read the previous posts from this blog starting from “Why do we need causality in data science”.

What can you expect from a causal inference analysis?

From this example, we can have an idea of what is the focus of causal inference.

  • Assessing confounding.
  • Separating and estimating causal effects, assigning causal credit to each of the causes.
  • Deciding whether we can give an answer with our data, we need to include more variables in our analysis or saying we cannot give a proper estimate.
  • Using graphs as a communication tool to explicit your objectives, risks and assumptions.
  • Quantitative estimates of causal effects.

There is also a collateral benefit. The exercise of creating a graph describing your business process makes you pose many interesting questions about your business and clarify some concepts you didn’t know before.

What are the risks of not using causal inference? Causal inference decreases the chances that your analysis is totally wrong. How much wrong? well as we saw in Simpson’s paradox you can easily arrive to the totally opposed conclusion from what reality is.

Does causal inference guarantee that my analysis is right? No, as we saw, you can still be missing confounders or modeling your causal relations wrong and getting false conclusions! In case you can perform an A/B test, you should take definitive conclusions with it. If you want to perform many tests, you can still use causal inference to prioritize your tests! However, if you cannot perform such tests, causal inference is currently the best tool to make estimates about causal effects.

You can continue to the third part here.

--

--