You might have heard about Google’s Knowledge Graph, which provides literal answers to users’ queries on search engines.
Well, these knowledge graphs have a long history in computer science. Researchers already tried in the ’70s to create a universal database able to answer any possible question. But this approach quickly reached its limits, in terms of scalability and understanding capabilities.
Today, these data structures have found a second life in enterprise databases, market intelligence, and recommendation algorithms. Let’s look at the benefits and limitations of these knowledge systems.
The history of expert systems
In the 1970s, some AI researchers doubt that computational systems could form a global understanding of things. So they decided to make them work on narrow, well-defined problems.
By feeding algorithms with knowledge about specific areas of expertise, they thought they would make better decisions than human specialists. What they called computer-based expert systems had three parts.
First, a knowledge base filled with "if-then" rules related to the subject at hand (e.g. if an animal has feathers, it is a bird). Secondly, a running memory system that contains current assertions and information ("this animal has a fur"). Third, an inference system that reasons about the case with the different rules and gives them more or less weight ("this animal has no feathers, therefore it is not a bird").
Researchers have built expert algorithms with tens of thousands of rules and were greatly surprised by the result. In some cases, they found that they could almost beat human experts in their judgment.
For example, a team from Stanford University tried to create an expert system that would give accurate diagnoses of blood diseases. This system called MYCIN was fed by the knowledge of real doctors who had dedicated their lives to the subject. As cases became increasingly complex, the algorithm also included probability factors to assess the uncertainty of the decision.
Thanks to this sophisticated database, MYCIN could provide diagnoses equal or superior to those of human doctors on the same case. And this has paved the way for many other expert systems dealing with similarly narrow subjects.
But despite all these claims, these systems could show real limitations, depending on the nature of the subject studied. One of them is that it is difficult to extract all the explicit rules behind complex topics.
The temptation of a universal base of information
Spurred by the successes of symbolic-based AI, researchers suggested in the ’80s increasingly ambitious projects.
One of them, Douglas Lenat, envisioned a computing system that encompasses enough knowledge to build an intelligence of our world. What he called the Cyc Project.
To achieve such a project, Lenat and his team sought to describe every explicit rule that makes us understand our environment (for example that a rock thrown in a pond sinks or that an airplane needs thrust to fly).
But before explaining rules, researchers had to first teach the system some basic concepts to enable thinking (what is a thing, what are its relations with other things…). They had to try to frame what they called an ontology of engineering, the foundations for common understanding. On this structural basis, the team, with a lot of effort, created an extremely large knowledge map over ten years.
Fascinated by their project, Vaughan Pratt, a pioneer in Computer Science, decided to critically assess the intelligence of their system. In several demos, Pratt tested the understanding of Cyc with cognitive exercises. One involved analyzing a picture of a person relaxing. Cyc’s response was to display a picture of people carrying a surfboard, associating surfing with the beach and a relaxing environment, which showed good reasoning. Yet, when checking the system thinking chain, Pratt also realized that it relied on less necessary logical inferences (like that humans have two feet).
That was not the only flaw of Cyc. It struggled as well on more complex general knowledge questions. When asked whether bread is a beverage, the system didn’t give any intelligible answer. Similarly, although Cyc knew many causes of death, he did not know about death by starvation. The demo thus ended on a pessimist note: Cyc seemed to always stumble on knowledge gaps that eroded its global coherence.
Nevertheless, Douglas Lenat kept working on this project, bringing new ways to build a knowledge base. And he might still be into something, as knowledge systems are now finding new and interesting applications.
Using knowledge graphs to feed data understanding
Since the experiments of the ’80s, intelligent knowledge systems have found more concrete commercial applications.
The most iconic example is the way Google has applied knowledge graphs to its services. In 2012, the company has launched a system that could grasp knowledge structure on the Internet to feed its search engine. Based on user intent, Google’s algorithm can relate a word to a concept, and untangle different meanings behind a single word.
For example, people typing a query like "Taj Mahal" can both look for information about the Indian monument, news about the artist of the same name, or where to find the Indian restaurant on the corner. It all depends on the intention of the user typing it.
To deliver the right response, Google is thus building a knowledge system that can understand information more deeply than a literal keyword-query match. It can extract general rules out of a text and reason the facts logically. For example, when you look for the "green monster in Star Wars", Google can figure out you’re looking for Jabba the Hut from links stating that it is "a large, slug-like alien". So it can associate the concept "alien" with the query "green monster", which is relatively accurate.
But besides search models, knowledge graphs have also been used to feed other AI models. For example, in the health sector, research companies gather a huge volume of Data without any kind of labeling or structuring. This data is a huge missed opportunity to gain better insights into drug development and treatment possibilities.
By automatically annotating and using human insights, natural language processing models provide meaning to research data, making inferences between disease and treatment. Based on carefully structured knowledge, such models can connect disease or known health conditions to scientific studies with accurate, well-founded, and transparent reasoning.
Such models are also increasingly used in investment and business development decisions. Large companies need market intelligence that is not just based on meaningless data scraping. They want to capture information that is relevant to their specific business problem.
They also want to be able to monitor all the news and comments about the company to assess their reputation. Based on a Knowledge graph, intelligent data processing software can help discover new insights and gather deep feedback about the company.
In other words, the collaboration between human knowledge and data intelligence has never been so intense. And this is just the beginning of a long and fruitful conversation between humans and machines!