In my previous article, I kicked off the "Read with Me" book club to explore Judea Pearl’s "The Book of Why". I would like to thank everyone who has shown interest and signed up to join the club. I am hopeful that we can embark on a journey to deepen our understanding of Causality by reading and sharing insights together. After two weeks, as promised, I am sharing some key points I took from the first two chapters.
In these two chapters, Judea starts by explaining the Ladder of Causality and reviews the historical development of causal theory. We will further deep dive into the three Rungs.
Rung 1: Association
Back in 1800, from Galton to Pearson, as they sought to understand how humans inherit genetic traits, they found that correlation was sufficient in a scientific sense. After all, "Data is all there is to science." To them, causality is merely a special case of correlation that can never be proven. On the other hand, correlation is powerful enough to explain why sons of taller fathers are taller than the population average. Correlation-based forecasting models make predictions by identifying the most predictive variables to the target of interest, even though it might not make sense in many cases. For example, there is a strong correlation between a nation’s per capita chocolate consumption and its number of Nobel Prize winners. Apparently, eating more chocolate wouldn’t give you a higher chance of winning the Nobel Prize, and a country’s wealth is more likely to be the confounder here. We can find a lot of examples like this one that don’t give meaningful and scientific information. When presented with these findings, Pearson dismissed them as mere "spurious" correlations.
Besides "spurious" correlation, it is also common to find correlation found in population reversed in subgroups. For example, when measuring the correlation between skull length and breadth, the correlation is negligible when measured separately in male and female groups. However, it is significant when combining the gender groups. This is common when confounders exist, and we call it Simpson’s Paradox.
Even with these shortcomings, correlation-based predictive models can achieve decent accuracy during static settings. Statisticians have developed many sophisticated models to extract insights from data. With larger data samples collected and more complicated algorithms embedded, we can build decent models to predict sales, customer churns/retention, etc. However, when facing new situations, correlation-based models cannot make trustworthy predictions lacking historical data. To maintain good performance, model developers must keep feeding the models with more and more data, covering all existing situations explicitly, which is always lagging behind new settings. Just as Pearl wrote:
Lack of flexibility and adaptability is inevitable in any system that works at the first level of the Ladder of Causation.
Rung 2: Intervention
Beyond "seeing what it is", we are moving upward to "change what it is". What will happen to this product’s sales if I increase its price? Will this customer continue to pay a subscription if I stop giving him promotions? Rung 2 questions cannot be answered by correlation-based models anymore. For example, when we use observational data to find the past correlation between price and sales, it is likely to be biased by the macroeconomic situation, sales and price of other products, etc. Rather than estimating the pure impact of price on sales, observational data show the combinations of many factors’ impacts on sales. Some of these factors can be observed and quantified, but some are hard to quantify, preventing us from getting the pure impact.
The first researcher to reach Rung 2 was Sewall Wright when he drew the first path diagram illustrating the factors leading to coat color in guinea pigs. The diagram not only lists all factors but also gives guidance on estimating the strength of each factor. To answer Rung 2 questions, besides data collected from Rung 1, we will need causal hypotheses, like Wright’s path diagram. The following graph illustrates a causal diagram Wright proposed to estimate the birth weight of guinea pigs.
The benefit of the causal diagram is that we can isolate the impact of each factor. For example, what’s the pure impact of the gestation period on birth weight? Even if there are factors we cannot quantify or unobserved by us, we can propose other factor that is closely correlated with the unquantifiable factors and estimate their effect appropriately in the causal diagram. The concept of instrument variable originated from here.
However, even though Wright proposed this idea of the causal diagram in the 1920s, it was the theory that was never accepted by the mainstream. Judea compared the situation with Galileo when he claimed Earth revolves around the sun. The objection came from the norm that "data were to receive priority in all cases over opinions and interpretations." After all, data are objective, and opinions are subjective. However, as Judea wrote:
Where causation is concerned, a grain of wise subjectivity tells us more about the real world than any amount of objectivity.
Things only began to change around the 1960s, when economists and sociologists started using structural equation models (SEM). These models embedded causalities in linear algebra. Among the structural equations, there are variables specified as endogenous variables, which are causally related to the target, and others are called exogenous variables. Including endogenous variables helps solve the confounding problems when drawing a causal conclusion. The only drawback is that the causalities that exist in these models are non-directional. Thus, it still doesn’t solve the issue of reverse causality for questions like could it be that y caused x rather than x caused y?
Another step forward in the theory that breaks through data dominance is the emergence of Bayesian theory, which provides an objective way of combining observed data with subject prior knowledge. Bayesian statistics start with a subjective prior belief, such as a coin is not fair, and then they keep revising their belief by collecting new evidence – tossing the coin multiple times. If we see the head roughly half the time, then modify the belief that the coin should be fair. Due to the inclusion of subjectivity, the causality theory starts with Bayesian probability, then takes a huge detour, which is the story that will be shared in the next chapters.
Rung 3: Counterfactuals
Believe it or not, back to our Homo sapiens ancestors, when they planned a mammoth hunt, they already had a mental causal model in mind for the hunt success rate:
The mental model shows the different causes of the success rate, and it’s where imagination takes place. It helps our ancestors to experiment with different scenarios by making local alterations to the model. For example, if we increase the number of hunters, how much can we increase the success rate? If the size of a mammoth is larger than usual, how many more hunters should we include to maintain the same success rate? The ability of imagination takes us to the highest ladder in causality at Rung 3.
Yuval Harari defined Cognitive Revolution in his book "Sapiens: A Brief History of Mankind " as the manifested ability to depict imaginary creatures. "The Lion Man of Stadel Cave" statue discovered by the archaeologist shows the Homo sapiens’ newly developed ability to imagine a non-existing "half man and half lion" creature. The ability to imagine distinguishes humans from other creatures, just like how making counterfactuals distinguishes human intelligence from animal and artificial intelligence.
Counterfactuals have a particularly problematic relationship with data because data are facts, and counterfactuals are imaginations. Data cannot tell us what will happen in an imaginary world where some observed facts are bluntly negated. Causal models, on the other hand, can guide us in this scenario by helping us reason what would have happened. In the business world, these "would haves" represent the most profitable questions – what would have happened to the product sales had I increased the price? Answering these questions correctly will guide us to make any decision with confidence.
Data tells us what, and we find out why. Most of us have heard of the Garden of Eden from the Old Testament. According to the book of Genesis, when God finds Adam hiding in the garden, the following conversation takes place:
God: Have you eaten from the tree which I forbade you?
Adam: The woman you gave me for a companion, she gave me fruit from the tree and I ate.
God (to Eve): What is this you have done?
Eve: The serpent deceived me, and I ate.
God asked what, but they replied why. God asked for facts, but they replied for explanations. Their instincts were that naming the causes would alleviate the consequences of their actions. Data are objective facts, and humans explain them subjectively to draw conclusions and guide further actions. Finding why behind what is innate to human intelligence, which, at the same time, is what AI lacks. AI cannot resemble human-level intelligence without causal structures – – the compact representation of human subjective thought processes. Without it, AI models are merely gaining animal-like abilities, like the cat example I gave in my previous article.
That’s all I want to share for the first two chapters of "The Book of Why" by Judea Pearl. Chapters 1 and 2 set the book’s foundation by introducing the Ladder of Causality and the whole theory’s development history. I foresee the following chapters to have more technical details and industrial applications. Don’t forget to subscribe to my email list to join the following discussions:
- Chapter3&4: Causal Diagram: Confronting the Achilles’ Heel in Observational Data
- Chapter5&6:Why Understanding the Data-Generation Process Is More Important Than the Data Itself
- Chapter7&8 You Can’t Step in the Same River Twice
- Chapter9&10 What Makes A Strong AI
- Bonus: How is Causal Inference Difference in Academia and Industry?
There are many more details to what I have shared here. As always, I highly encourage you to read, think and share what’s your main takeaways either here or at your own blog post.
Thanks for reading. If you like this article, don’t forget to:
- Check my recent articles about the 4Ds in data storytelling: making art out of science; _continuous learning in data science; how I become a data scientist;_
- _Check my other articles on different topics like data science interview preparation; causal inference;_
- Subscribe to my email list;
- Sign up for medium membership;
- Or follow me on YouTube and watch my most recent YouTube video about my work day as a WFH Data Scientist:
- Or other books I read:
Reference
The Book of Why by Judea Pearl
The Ladder of Causality Photos:
_[1] Robot Photo by Rock’n Roll Monkey on Unsplash;_
_[2] Cat Photo by Raoul Droog on Unsplash;_
_[3] Intervention Photo by British Library on Unsplash;_
_[4] Human Playing Chess Photo by JESHOOTS.COM on Unsplash;_