The future of Artificial Intelligence: from statistical learning to acting and thinking in an imagined space

Building human-thinking machines requires ditching statistics in favour of causality

Published in

Towards Data Science

7 min readAug 7, 2022

Although the field of AI has been booming in recent years, we are still far from developing a human-thinking machine. Indeed, machines cannot yet adapt to new and different settings with the same ease as humans. Furthermore, computer systems do not yet possess the gift of imagination, which has been essential for the evolution of humankind. These limitations come from the learning paradigm currently adopted in the field, which is solely based on correlation learning. In this article, we will first walk through the history of the field of AI, looking at how the field has evolved over the years, and successively, we will argue that once again, a revolution in the field is mandatory. Specifically, if we really want to build a machine on the verge of human-level intelligence, we need to ditch the current statistical and data-driven learning paradigm in favour of a causal-based approach.

In the 1970s and early 1980s, computer scientists believed that the manipulation of symbols provided a priori by humans was sufficient for computer systems to exhibit intelligence and solve seemingly hard problems. This hypothesis came to be known as the symbol-rule hypothesis.

However, despite some initial encouraging progress, such as computer chess and theorem proving, it soon became apparent that rule-based systems could not solve problems that appear seemingly simple to humans. As Hans Moravec put it:

“It is comparatively easy to make computers exhibit adult level performance […] and difficult or impossible to give them the skills of a one-year-old”.

In addition, rule-based systems could not work well under uncertainty or contradictory data, which are ubiquitous in nature due to random and systematic errors. Because of these limitations and lack of prospects, interest in AI declined, and the field entered a period known as AI winter.

Eventually, a few years later, largely independent of the field of classic AI, a new field known as Machine Learning started to emerge. Like Rosenblatt’s early work on the perceptron, Machine Learning was built on the observation that the representations and rules of natural intelligent systems are acquired from experience through processes of evolution and learning rather than through the symbol-rule hypothesis. Since then, Machine Learning and especially Deep Learning, a subfield of Machine Learning based on Artificial Neural Networks, have produced the most remarkable successes in the field of AI.

However, although these tremendous developments have caught many scientists by surprise and have induced many to believe that the advent of Strong AI is near, we are still far from developing a machine on the verge of human-level intelligence, and, perhaps, we are not going to achieve it unless a significant shift in AI research occurs. Indeed, the generalisation capabilities of current state-of-the-art AI systems are still utterly poor, limiting their application to narrow and specific tasks. In contrast, humans can adapt to new and entirely different settings with ease.

Most strikingly, questions such as “What if I do …?”, “How …?”, “Why …?” and “What if I had done …?”, which humans find relatively easy to answer, are prohibitive to computer systems. As a direct result, machines cannot reason about the possible effects of their actions on the external environment and choose among these deliberate alterations to produce the desired outcome. Furthermore, they lack imagination and retrospection as they cannot reflect on their past actions and envision alternative scenarios.

Perhaps, these limitations come as no surprise; after all, current machine learning systems operate completely in a mere associational mode, and ultimately their success boils down to four main factors: (i) independent and identically distributed random variables assumption, (ii) massive amounts of data, (iii) high-capacity models and (iv) high-performance computing. Simply put, they solely try to fit a function to raw data by capturing statistical correlations rather than reason about the complex net of causal relationships, and they do so by eating up large amounts of raw data and computational resources. As a matter of example, open umbrellas and rainy days are correlated phenomena, but only the latter has a direct causal link to the former. Thus, while seeing people with open umbrellas suggest that it is raining, closing umbrellas does not stop the rain. Although this might seem trivial to humans, machines do not yet have a clue about this kind of relationship, and as such, they would predict that closing umbrellas actually stops the rain. Therefore, like the prisoners in Plato’s Allegory of the Cave, machine learning programs learn to predict the movement of the shadows in the cave, but they fail to understand that those shadows are mere projections of three-dimensional objects.

Even our ancestors initially lacked causal knowledge, but as Yuval Harari posits in his book Sapiens, as soon as humans started to realise that certain things cause others and that playing with the former can change the latter, we have evolved at a dramatically faster pace. This evolutionary process came to be known as Cognitive Revolution. All of these considerations suggest that we are not getting any closer to the ambition of building a human-thinking machine capable of acting in an imagined space in the sense of Konrad Lorenz unless we embed it with a causal knowledge-based mental model.

In order to achieve this aim, Judea Pearl, one of the most prominent exponents of the new science called Causality, proposes to implant a “causal inference engine” in future AI systems. This causal inference engine is a machine that receives a query and a bunch of data as input so to generate an estimand and an estimate for the answer. While the estimand can be thought of as a recipe for answering the query, and it is produced according to the underlying causal model, the estimate is the actual answer in light of the input data. Thus, unlike the traditional statistical approach, the role of data is only relegated to the computation of the estimate. This is profoundly in contrast with Machine Learning, which is based on data-driven learning instead.

The rationale behind this design is that raw data is inherently dumb. Indeed, although the current research trends seem to hope that a data-centric approach will lead us to the correct answer whenever causal questions arise, it can be proven that causal questions cannot be answered directly from raw data. In fact, causal reasoning requires some assumptions about the underlying data generative process, and the field of Causality has shown that we can formalise these assumptions by means of a set of mathematical objects known as causal models.

At this point, we should point out that the field of Causality traditionally assumes the causal models to be given a priori by humans. In addition, the causal variables generated by the underlying causal models are assumed to be directly observable. However, these assumptions are generally unrealistic.
Indeed, in some fields, our knowledge is in such an embryonic state that we have no clue about how the world operates. Moreover, real-world observations are not usually structured into causal variable units. As an example, objects in images that allow causal reasoning first need to be extracted. Therefore, as Machine Learning went beyond the symbol-rule hypothesis in not requiring the symbols to be given a priori, the emerging field of Causality shall strive to learn the causal models of real-world phenomena and discover their causal variable units from real-world observations in an automatic manner. After all, a future Strong-AI machine equipped with a causal inference engine shall be capable of hypothesising some assumptions about the world and later fine-tuning them as it acquires further experience.

These shortcomings can be addressed by benefiting from the advances in Machine Learning. Indeed, discovering causal variables from unstructured raw data and successively learning the underlying causal models are both data-centric operations, and Machine Learning excels at this. Furthermore, modern Machine Learning techniques can help us overcome the curse of dimensionality during the statistical estimation step of the causal inference engine. All of this leads us to the conclusion that if we really want to build a machine on the verge of human intelligence, then we need to merge Causality and Machine Learning into a single field: Causal Machine Learning.

To summarise, there is still a long journey ahead for us to build a human-thinking machine, and a shift in the current AI-research trends is mandatory to attain this purpose. Much like the field of AI went beyond the symbol-rule hypothesis by embracing Machine Learning’s advances, it is now necessary to ditch a purely statistical and data-driven learning paradigm in favour of a causal-based approach. Yet, the tools of the emerging field of Causality are insufficient for giving machines the gift of causal thinking. This is why, although these two fields arose and developed separately, Causality and Machine Learning need to be merged into a new and promising field called Causal Machine Learning. Perhaps, as we humans evolved at a dramatically faster rate when we started asking ourselves causal questions, once we discover how to pair Causality with Machine Learning successfully, the Singularity will be just around the corner.

The future of Artificial Intelligence: from statistical learning to acting and thinking in an imagined space

Building human-thinking machines requires ditching statistics in favour of causality

Suggested readings:

Written by Alberto Tamajo