The world’s leading publication for data science, AI, and ML professionals.

Why You Should Pay More Attention to Knowledge Graphs?

Short answer: To model and solve complex real-world problems. Find out how? Keep reading!

Photo by Pietro Jeng on Unsplash
Photo by Pietro Jeng on Unsplash

In this Data age, we are generating tons of data from several different sources and recently, there has been significant growth in the construction of knowledge graphs. Now, we see knowledge graphs are a popular topic in several domains like Medicine, Telecommunications and even Social Media.

The idea is simple, we aggregate all the data collected from a domain into a single data structure like Graph. We prefer a data structure like a graph as it can capture complex real-life scenarios. Usually, the size of this graph is huge and it is not uncommon to see millions or billions of entities present in such graphs.

The term "Knowledge Graph" was introduced by Google and quickly it became popular. Broadly speaking, we can name any sizeable graph holding some knowledge/ important information as a knowledge graph.

Examples: Some very popular knowledge graphs include:

  1. Google Knowledge Graph with 500 billion facts and 5 billion entities.
  2. Amazon Product Graph.
  3. Open-source Knowledge Graphs: DBpedia, Wikidata etc.

Knowledge Graph Construction

Photo by Danist Soh on Unsplash
Photo by Danist Soh on Unsplash

So what are the building blocks of a Knowledge Graph? Let’s understand how these huge graphs are constructed.

Knowledge Graphs are usually represented as directed graphs so each entity of this graph can be termed as a Triple.

Triple, as the name suggests, is a tuple made of 3 elements, namely: the source node, the relation and the target node.

Nomenclature:

The elements of triples can be referred to using different names such as:

  1. (s,p,o): Subject, Predicate, Object
  2. (h,r,t): Head, Relation, Tail
  3. (s,r,t): Source, Relation, Target

Let me elaborate on this with a small example.

Graph Example (Source: Author)
Graph Example (Source: Author)

In the above graph, we have 4 nodes and 3 edges or we can say that there are 3 triples. One of those triples is:

( "Eren " , "wants " , "Freedom" )

where Eren is the source node, wants is the relation and Freedom is the target node and this is a directed relationship. As I mentioned before, a knowledge graph would consist of several such triples.


Importance of Knowledge Graphs

We have understood about knowledge graphs and what are they made of. Now, what is the use of these kinds of graphs?

Knowledge graphs can be used in several different ways. Each domain where the graph is being constructed has its own use cases.

Photo by Alina Grubnyak on Unsplash
Photo by Alina Grubnyak on Unsplash

For instance, in the biomedical field, we have the example of a Protein-Protein Interaction (PPI) network which is a graph composed of proteins found in the human body and their interactions with each other. These kinds of graphs are used in identifying important protein complexes and also on studying the behaviour of the protein interactions on different types of drug testing or disease infection. In simple words, the PPI networks can help us discover new drugs for the existing diseases or any new ones like Covid-19.

Google’s knowledge graph is about optimizing the search engine and the delivery of the relevant information to the users. If you made a search using the keyword "interstellar" in google, you would receive information about the movie Interstellar and also other recommendations which are similar to it like some other space sci-fi movies. There would be some articles on interstellar space in a science context or just a dictionary explaining its meaning. Google wants to understand the context of what the user is trying to search and show the most relevant information on top.

Orange is working on its own knowledge graph flavour called Thing in the future. Here, the data is collected from physical entities present in real life such as traffic lights, geolocation data, building data, city data etc. Several IoT connected devices are employed for this purpose and the data holds the structural and semantic description of the environments. The main idea here is to map the real world entities into a digital world and build several use cases that would be helpful for the end-user like information retrieval.

Acknowledgement: I have done an internship at Orange Labs in Cesson-Sévigné on the data quality improvement of Thing in the future knowledge graph using state-of-the-art Machine Learning techniques.

I have mentioned three examples but there are several other possibilities like movie recommendations (on sites like NetFlix), new friend suggestions (on sites like Facebook and Instagram) etc.


Some Issues

Photo by UX Indonesia on Unsplash
Photo by UX Indonesia on Unsplash

It all sounds good, so let’s just build knowledge graphs and reap their benefits, right?

It is not so simple. Due to the sheer size of the knowledge graph itself, there exist some issues/ problems when dealing with this graph. Some of those issues are listed below:

  1. Missing data or bad quality data: While we construct the graph, the information in the form of triples is injected into the database. Many a time, there is some information loss either due to data corruption or other factors. The process of fixing this issue is termed knowledge graph refinement or knowledge graph completion.
  2. Non-Euclidean nature of the graph: One advantage of using a graph data structure is that we can capture complex interactions from the real world. But, due to the complex nature of graphs, we cannot use them directly as input to the machine learning algorithms in order to build a predictive model. There have been several studies made on graph structures and we have methods such as knowledge graph embeddings which generates a numerical representation of the entities in a low-dimensional space. Subsequently, these embeddings can be used to build a machine learning model but not the graph itself.

We also have Graph Neural Networks (GNN) which are neural network methods that can be applied to graph data. Even here, we need to have numerical representations for the nodes in the graph to build a GNN model.

I made two interesting posts already on graph neural networks where I explain how we can build GNN models using PyTorch Geometric python library. I recommend you to check them if you are interested in graph-based analysis.

A Beginner’s Guide to Graph Neural Networks Using PyTorch Geometric – Part 1

A Beginner’s Guide to Graph Neural Networks Using PyTorch Geometric – Part 2


I will write more articles on knowledge graphs and their embedding techniques. Stay tuned!

If you have reached this part of the post, thank you for reading and also your attention. I hope you found the post informative and if you have any questions feel free to reach me on LinkedIn, Twitter or GitHub.


Related Articles