The world’s leading publication for data science, AI, and ML professionals.

Want to know how the first truly successful AI system is being revived?

And how not to mess it up…

Photo by Danny Lines on Unsplash
Photo by Danny Lines on Unsplash

Summary

  1. AI as a research field started in 1956 by bringing together different approaches to the study of ‘thinking machines’. Symbolic AI was the paradigm that reigned during the first decades of development.
  2. Small, incremental improvements in symbolic AI have resulted in great successes in this field.
  3. Despite its fall in popularity, symbolic AI is not gone for good nor has it been deemed useless by researchers.
  4. As we go deeper into the research on the reigning paradigm (connectionism and deep learning in particular) we are starting to run into some problems which may be solved by neuro-symbolic AI.

What’s up?

A while ago I was reading a book about different approaches to AI. I had heard, read about, or worked with all of them, but this gave a full picture of the landscape. I now understand the differences and similarities between the various schools of thought that populate the AI research field.

But knowledge is never complete.

I always have an itch to scratch after I read a book. Like a kid who is constantly asking ‘why’ questions. In particular, I was extremely curious about the combination of different AI paradigms. While scratching my itch, a.k.a. losing myself to YouTube talks and the links of Wikipedia articles’ resources, I came across a talk by David Cox on neuro-symbolic AI.

Neuro-symbolic AI is a combination of two AI paradigms: connectionism and symbolism. Connectionism is extremely popular at the moment. It is mostly known for the successes of machine learning and deep learning. But it has its limitations and we might be reaching some of them. That is where old and quiet symbolic AI comes to the rescue with its wit and wisdom.

A research field is born.

It goes without saying we didn’t get here in a blink of an eye. Contrary to what may be said, AI is not a new field and millions of pages have already been written to explain its concepts and ideas. But how did it all start? What influence does the past have on the present developments? Well, it is hard to tell for sure (as is often the case) but one thing we are certain about is that the term ‘Artificial Intelligence‘ was introduced in a workshop carried out in 1956 and this is considered the birthplace of the field of AI.

By then, the 29-year-old Marvin Minsky had just finished his Ph.D. dissertation at Princeton and was lecturing at Harvard. The Ph.D. was in the field of mathematics and related to brain-modeling with neural networks. He felt that the field was lacking coordination, connection, and common principles. John McCarthy, also 29 years old, also with a Ph.D. from Princeton, also from mathematics, probably felt the same way since they both agreed on organizing a summer workshop. Here researchers would work together on their projects and clarify a field that was quite dispersed. The field of "thinking machines". This encompassed disciplines with old and virtually obsolete names such as cybernetics and automata theory. Since McCarthy and Minsky did not want to favor any specific topic, they opted to name the workshop as ‘Dartmouth Summer Research Project on Artificial Intelligence’.

They were both Junior scholars (and probably with little influence), thus they paired up with Nathaniel Rochester and Claude Shannon. In case you don’t know (I surely didn’t), Rochester designed the IBM 701 and helped in the development of LISP, a programming language that was the basis of many of the first AI systems. Claude Shannon is known as the ‘father’ of information theory.

So, Minsky from Harvard, McCarthy from Dartmouth, Shannon from Bell Telephone Corporations and, Rochester from IBM wrote a proposal in which they request funding for their workshop. The proposal started like this:

Source: http://raysolomonoff.com/dartmouth/boxa/dart564props.pdf
Source: http://raysolomonoff.com/dartmouth/boxa/dart564props.pdf

This proposal goes on to explain some of the problems the field was facing such as Neural Nets and programming a computer to use a language. In order to tackle such topics, the best and brightest were needed. Among the attendants of the workshop, there were indeed some brilliant minds such as John Nash (you know, the mathematician from A Beautiful Mind) and Warren McCulloch who by then had already proposed the first architecture for artificial neural networks with Walter Pitts. Moreover, Allen Newell and Herbert A. Simon were present during the two first weeks.

There was a particular project which Newell and Simon had been working on: The Logic Theorist. This system was being developed to use logic and reason to prove mathematical theorems. We’ll see later how they were doing that.

<iframe src="https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https%3A//twitter.com/cmuengineering/status/636887312031477760&image=" title="Allen Newell and Herbert A. Simon, together with Minsky and McCarthy are now known as the "fathers of AI"." height="281" width="500">

Fast forward to 2006 and we are at AI@50. This conference was held to "celebrate the Dartmouth Research Project, which occurred in 1956; to assess how far AI has progressed; and to project where AI is going or should be going.". There, McCarthy shared that even though the project was not successful in terms of collaboration since most attendees kept to their own research agenda, relevant research developments were carried out. Those ‘relevant research developments’ were Allen Newell and Herbert Simon’s Information Processing Language (IPL) and The Logic Theorist.

Crawling before walking.

Early in his career, Allen Newell set his personal mission to understand and emulate human-decision making. He had a long interest in the figuring out the way humans solve problems and was determined to create a system that could mimic that. Instead of aiming for a flawless system, Newell wanted something that got things right when a human would get it right and made mistakes when a human would make mistakes. Anything that was not human-like thinking would be somewhat neglected by him. Eventually, he realized that much of human thinking is done through the operation of symbols. If I say "I am now writing on a computer because I read a book that sparked my interest" you immediately envision different entities (person, computer, book). These have their own properties (person has age, nationality, interests, etc). And they have some sort of relationship between each other (person reads a book, person operates a computer).

Herbert Simon was also fond of the idea of using symbol systems, that is machines that operate symbols, to deal with the problems of decision-making. Once they met in 1954, a 40-year academic collaboration started. Herbert Simon was then a professor of industrial administration at Carnegie Institute of Technology and Newell even went to Pittsburgh so he could start his Ph.D. under Simon’s supervision.

One of their early projects was a chess-playing machine that would make its decisions using heuristics. In short, heuristics are mental shortcuts that allow us to make decisions that lead to fast and reasonable outcomes. Examples are rules of thumb and educated guesses. This work laid the grounds for further development of chess-playing computers such as Deep Thought which is the predecessor of Deep Blue, the system that beat the world champion, Kasparov, in 1996. The first was designed by Carnegie Mellon Institute’s students and the second was created by some of these same students and sponsored by IBM.

Heuristics were also used during the development of the Logic Theorist. Its details are an article of its own. But as this is said to be the first truly successful AI program and it’s fascinating to see how it influenced the field deeply, I couldn’t skip it.

I did my best to keep the explanation simple but it might still get a bit too much. So if you don’t feel like it, you can always skip to the next section.

Let us start with its goal as stated in the original report:

Source: http://shelf1.library.cmu.edu/IMLS/MindModels/logictheorymachine.pdf
Source: http://shelf1.library.cmu.edu/IMLS/MindModels/logictheorymachine.pdf

Basically, axioms are assumptions, that is, a statement which we believe to be true and is usually used as the starting point for further reasoning. A quick look at the Oxford dictionary and we see that inference is:

  1. something that you can find out indirectly from what you already know.
  2. the act or process of forming an opinion, based on what you already know.

So, an inference is either the conclusion itself or the process of reaching the conclusion.

Now consider the following three sentences:

  1. The sun is a star.
  2. Earth orbits around the sun.
  3. Then, the earth orbits around a star.

The first two sentences are the axioms and can be used to infer the third sentence, the theorem. If this stuff reminds you of high school philosophy it is because we’re entering the realms of logic, a discipline of philosophy (and mathematics).

Some rules of logic. Don't mind about them much. Source:https://math.stackexchange.com/questions/768757/propositional-logic-proof-using-i-p-or-c-p-or-rules-of-inference
Some rules of logic. Don’t mind about them much. Source:https://math.stackexchange.com/questions/768757/propositional-logic-proof-using-i-p-or-c-p-or-rules-of-inference

What the Logic Theorist does to prove theorems is to explore some sort of tree where the root is the theorem itself (the conclusion). The first branches are information we could have inferred from the root. New branches can be created every time we infer something new based on the information from the previous branches. Eventually, we reach the premises that we know to be true and thus have proof (assuming we reached them following rules of logic).

In other words, we would start by figuring out what is needed in order to make the third sentence true which is the first and second sentences, and then go to our bag of axioms to assess whether they were there. In the field lingo, the bag of axioms is called the Knowledge Base. The job of ‘figuring out what is needed’ is carried out by the Inference Engine which applies rules of logic to the Knowledge base and infers new knowledge (which can then be added to the Knowledge Base).

The Knowledge Base and the Inference Engine are the two basic components of the Logic Theorist but there is more. Each time we make an inference we generate multiple new premises, that is, our tree branches out. Not all the branches will lead us to our goal (the axioms which we know to be true) but we don’t know which is the branch that will. We could search through all branches, but this would lead to an exponential explosion of possibilities and become prohibitive in terms of computing time and resources.

So, we must limit our search. But to which branches?

To solve this, Simon and Newell used heuristics. To better understand how heuristics were applied let’s imagine we are an ant at the root of a tree and we hope to find a sugary apple at a top branch. As we go up, three branches appear, and we have to choose which one to go to. We heard that for a branch to have an apple it should be thick, so we opt for the thickest of the three.

After climbing for a little while, we get to a new set of branches. Now their thickness is similar. Damn it. What do we do?

Well, sugary apples need water. Some of the branches seem to be dry but there are others that seem hydrated so off we go to one of those hydro homies. The thickness and hydration were shortcuts we used to increase the likelihood of choosing a path that would get us to our goal. It’s definitely not optimal but is probably good enough. Ants might don’t go through this thinking process, but we, humans, do. So, if you got the idea, and won’t judge me for my lack of knowledge of how ants search for food, I am happy.

Now, we must remember that all this is cutting-edge research at the time and there are no software packages they can use nor open-source programming languages. That’s not surprising since they didn’t even have the internet at the time. Therefore, the creation of the Logic Theorist was accompanied by a collaboration with graduate students, such as Edward A. Feigenbaum, to develop a programming language that could handle the processing of lists of symbols. Information Processing Language (IPL) was its name. Later, John McCarthy who was one of the organizers of Dartmouth’s workshop used IPL as an inspiration for designing LISP. LISP became the standard language in the AI community once it was developed.

Walking (and tripping) before running.

Even though it was considered the first successful AI program, the Logic Theorist was not flawless. It proved a lot of mathematical theorems, but this is a world where rules are very well defined and it is very different from what we face in the real world. Moreover, Herbert A. Simon predicted:

  • in 1958 that "within ten years a digital computer will be the world’s chess champion".
  • in 1965 that "machines will be capable, within twenty years, of doing any work a man can do."

Deep Blue beat Kasparov in the late 90s, that is forty years later. Today, it seems like we still have a long way to go before machines do every work we do. So, expectations were high but were not met. The critic’s voices became louder. And, in the second half of the 70s, the first AI winter took place.

Luckily for us, scientists and engineers kept working. Thus, regardless of its flaws, the Logic Theorist laid fertile grounds for the development of what we now know as expert systems. These systems flourished in the early 80s. From infectious disease diagnostics to the identification of unknown organic molecules, its applications were endless.

Can you guess who coined the term?

‘Expert systems’ was introduced in 1965 by no other than Feigenbaum (a guy that by the age of 19 had never seen a computer before and who was also awarded the Turing Award in 1994). The basic components of most expert systems are the Knowledge Base and the Induction Machine that we learned about earlier. To develop them, the programming language of choice was LISP (no big surprise there either).

From the 50s to the 80s, symbolic AI was the main paradigm and expert systems became its most popular form. It got more funding than research done using connectionist approaches such as deep learning. In the 70s and early 80s, the LISP machine market in the US alone would reach half a billion dollars. However, Minsky and others warned there was (again) too much hype surrounding the field. The developers overpromised, the businesses were overoptimistic and the press, as per usual, fuelled this vicious cycle. By the end of the 80s and early 90s, the feeling of excitement gave room to disappointment, symbolic AI and its expert systems would lose their status and the second AI winter would come.

Rising up.

Despite the drop of interest in the field of AI, developments kept on happening (e.g. Deep Blue) in all its forms. In 2012 neural networks and deep learning took the world by storm because they achieved tremendous accuracy when applied to computer vision problems. Since then, connectionism as been appraised as the way forward.

Now, there is an effort by IBM and MIT to combine it with symbolic AI. The goal is to tackle problems deep learning is having a hard time solving such as learning with fewer data and more transparency. We’ll need less data because systems will be able to reason and make use of the knowledge they already have. That means that in case you want to detect something on an image, there will be no need to show the system an example of every single thing there is for the system to be able to recognize it. Just like a kid that can look at a pink elephant and immediately see it is an elephant. Neural networks might have a hard time doing so. Transparency is incorporated in the very essence of symbolic algorithms because, as we saw in the case of the ant looking for an apple, these follow logical steps that are recorded and easily retraced.

I am going to spare your time by keeping my explanation short and direct you to two great summaries of-is-neurosymbolic-ai) Neuro-symbolic AI in case you want to know more about it. Yes, I could have written one myself, but as hard as it is for me to say, I probably would not do a better job. This article by Inside IBM Research‘s Katia and David Cox’s lecture is the best resource I have found. Luckily, this is not making a lot of headlines yet so I managed to digest a lot of what is out there.


AI’s evolution came with much effort and we should take that into consideration when talking about what AI-based systems can deliver. Overpromises and false expectations bring nothing but trouble.

Will we be able to keep the hype low while we work on having the combination of connectionism and symbolism delivering the needed results and mark the beginning of a new chapter?


Related Articles