Casual Causal Inference

What to expect from a causal inference business project: an executive’s guide III

Part III: Where causal inference stands in the current AI, Big Data, Data Science, Statistics, and Machine Learning scene?

Aleix Ruiz de Villa

Published in

Towards Data Science

7 min readSep 12, 2019

This is the third part of the post “What to expect from a causal inference business project: an executive’s guide”. You will find the second one here.

Most of these words have fuzzy meaning, at least at a popular level. Let me define first what will they mean some of them in this post.

Big data: All the computing infrastructure devoted to providing access and calculations for querying, preprocessing data or training models with large data sets (they do not fit in your laptop).

One of the main ideas in big data technologies is that the more data you have, the better. A priori, that’s a fair assumption. However, in some cases, it mistakenly leads to thinking that

If you have a very large dataset, data speaks by itself and you don’t even need modeling.
The more degree of detail or granularity the better. For instance, if you want to know where people have been and you track their position each second or millisecond, when actually only hourly data is relevant (depending on the application of course)

Machine learning: This is the technical area of expertise devoted to predictive systems. It is a combination of statistics, computing, and optimization. Most of the applications you see in media are in the area of supervised learning. In supervised learning, you have a historical database with many observations, wherein each observation you have a context description and a response variable, the one you want to predict. For example, if you want to automatically read a car plate’s number, you will have a database with images of car plates (context description) and the actual number of each car plate (response variable). This data set is typically labeled by humans (you need a human to read each image and annotate its number). The objective is to build an algorithm or system that for each new car plate image (not in your historical database), it is capable of automatically telling you which is the number in it. The essence of supervised learning is learning by trying to copy the past.

Currently, there is an increasing interest in reinforcement learning. Reinforcement learning’s concern is not about recreating the past, but to learn how to take actions optimally. It is widely used in training machines to play games. A very popular case was DeepMind training a machine with reinforcement learning to win at Go. There is even a documentary about that!

Current reinforcement learning relies on machine learning techniques. Moreover, it is mostly used in tasks where you have a simulator of the task you want to learn to do. For example, in learning to play games, you have a computer that can simulate a game.

AI: This is probably the fuzziest word among those. We will consider AI as a combination of machine learning, robotics and similar techniques that try to replicate human behavior. You can see a lot of companies talking about AI. They are in fact applying machine learning in tasks related to image, sound or text.

Data Science: This is an activity focused on applications, mostly in business, related to predictive analytics and getting insight from data. The core knowledge is a combination of statistics and machine learning, with lots of programming (and close relation to big data).

What’s the difference between machine learning and traditional statistics?

Machine learning and statistics differ in aim and usage. Let me show it through an example. Again imagine you work in the CausalNews newspaper. You have users paying a monthly subscription and want to increase your retention. Each customer chooses a content type (topic) and a price. You also know from her, her age and recent activity (how often reads your news on her mobile phone, whether she contacted you complaining about problems accessing your website, …). You are interested in knowing for each customer which is the probability of staying the next period.

Both a machine learning practitioner and statistician will build a model (formula, algorithm, …) relating customer information and probability of staying

probability of staying = f(age, activity, content type, price)

Although machine learning and statistics have different types of validating their models, it is even possible that both build the same model! How is the machine learning practitioner and statistician going to use the model?

Machine learning practitioner: Give me all your customers. I will evaluate the probability of staying in the next period. You can start contacting the ones with more risk of stopping their subscription.

Statistician: Younger people have more activity but are willing to pay less. Maybe you want to change your content a bit to get older people. Even more, price is not affecting much to your retention, so it seems you can slightly increase it.

From this example you can see that machine learning is usually more operational, while statistics is more strategic. The focus of machine learning is devoted to the next period (short term). Statistics is devoted to understanding the main factors affecting your customer’s retention so that you can build medium-large term policies to improve them.

Machine learning is more operational while causal inference is more strategic

What’s the role of causality in all this game?

In this example, causality has the same interest as statistics. Help you understand what are the key factors affecting your business. That’s what we call getting insights from data.

Statistics have been doing this job successfully for many decades. However, it also has some limitations. There is no explicit formalism for causality: causality has an intrinsic directionality that is not reflected in correlations, there is no definition of interventions, … one of the main concerns in statistics is addressing uncertainty (a highly difficult problem). Causality uses knowledge from statistics and goes one step further focusing entirely on cause-effect relationships.

Machine learning vs causality cultures

There are some crucial key points that are different in both disciplines. While in machine learning, as in big data, the more data the better. In causality, this is far than enough. Having more data only helps to get more precise estimates, but it is not helpful to have correct (unbiased) estimates! You can have infinite data, and yet missed major confounders producing totally mistaken results.

Either causal inference and machine learning rely on models, so both consider that data hardly speaks by itself. However, in machine learning generally you don’t need any domain knowledge, while in causal inference domain knowledge is essential. Human interaction is a must. That probably will make automate machine learning tasks much faster than causal inference ones.

Model validation is essentially different in each one. The oracle for most machine learning applications is cross-validation (evaluating your model in an unseen subset of data to estimate the predicting accuracy of the model). In causality, cross-validation is not enough. You need to test your assumptions with and A/B test.

Besides all this, causality is starting to help to improve machine learning. For example in areas as recommender systems or reinforcement learning. However, this is much more technical and a completely different story, so we will not talk about it here (yet).

Why now?

Well, that’s a difficult question. There has been a lot of effort from Judea Pearl and colleagues making causal inference more understandable by writing books more accessible for a wider audience. In parallel, there have been many advances in research in epidemiology using the potential outcomes framework (and alternative but equivalent way of dealing with causal inference).

On the other side, besides the impressive results of deep learning (currently the most popular machine learning technique), we are finding some limitations in the typical supervised learning approach. Yoshua Bengio, one of the pioneers in deep learning, who received a Turing award for his work in this area, warned, in this interview, about the necessity of including causality in the current machine learning point of view: “I’m not saying I want to forget deep learning… But we need to be able to extend it to do things like reasoning, learning causality, and exploring the world in order to learn and acquire information.”

Gary Marcus recently wrote an article in the New York Times also talking about similar necessities.

It is true that the process from research to application takes some time. The most popular techniques used nowadays in machine learning started in the 80’s and 90’s. All the research plus the huge advances in computers made them popular today. In that sense, there has been research in causal inference for decades and it seems it is flourishing now.

We are already taking most of our decisions based on observational data. Let’s do it with a stronger base!

Keep reading about causal inference

If you want to know more you can read other posts from this blog and the references therein (https://towardsdatascience.com/why-do-we-need-causality-in-data-science-aec710da021e). There is also a very nice blog by Uber that I would recommend “Using Causal Inference to Improve the Uber User Experience” and the books and courses:

Judea Pearl and Dana Mackenzie’s “The Book of Why”
Miguel Hernán course “Causal Diagrams: Draw Your Assumptions Before Your Conclusions”
Michael D. Ryall, Aaron L. Bramson “Inference and Intervention: Causal Models for Business Analysis”