The world’s leading publication for data science, AI, and ML professionals.

Knowledge Graph Embedding – A Simplified Version!

An explanation of what knowledge graph embeddings actually are and how to calculate them.


Photo by Nastya Dulhiier on Unsplash
Photo by Nastya Dulhiier on Unsplash

Graphs are one of my favorite data structures to work with. They enable us to represent complex real-world networks like a rapid transit system (eg, Paris metro, New York subway, etc), regional or worldwide air traffic, or something relatable like a social network of a person. They are very flexible and can be easily understood by humans. But to make the computers "understand" and "learn " from them, we have an extra step into it (called vectorization). This explanation might be too simplified but it is good enough to understand and follow along with the rest of this post.


What makes a Knowledge Graph special?

Photo by Uriel SC on Unsplash
Photo by Uriel SC on Unsplash

To easily understand how a Knowledge Graph is different from other graphs, let’s discuss an analogy. Imagine a video game with different levels, where each level gets difficult as you progress.

Level 1: It can be a simple undirected graph like a friends group at a university where the friends are nodes and the friendship is the edge. Here we just have nodes and edges, nothing too fancy.

Level 2: Adding a level of information to the previous level like the sense of direction, we obtain the directed graphs. A simple example of this is a city-wide bus network. Consider the bus stop as nodes and the route of a bus as an edge. Here, each bus moves in a specific direction from one stop to another. This adds to the directional information.

Level 3: We take a directed graph and add multiple flavors to the nodes and edges. To understand this easily, imagine the social network over the internet. The multiple flavors here on the nodes are the type of social network the user is based on. It can be Twitter, Facebook, or YouTube for instance. The flavors on edges can be the type of interactions between different users i.e, follow (in case of Twitter), friends or follow (case of Facebook), and subscriptions (case of YouTube). The directed nature of the graph comes into play as the follows can only be uni-directional. For example, you can follow Elon Musk but he might not follow you back on Twitter, hence the directional nature of the edge.

Level 4: Consider the graph in the previous level but instead of saying nodes and edges, you talk in the language of Triples. A triple is a building block of a knowledge graph where it is a tuple made of 3 elements, namely: the source node (head), the relation, and the target node (tail).

Note: Source node and Target node are termed as entities sometimes.

Be aware that the term "Knowledge Graph" is being used a bit vaguely. What I mean to say is that there is no fixed definition for a knowledge graph and broadly speaking, we can name any sizeable graph holding some knowledge/ important information as a knowledge graph.

The key takeaway here is that we treat elements of a knowledge graph as triples.


Knowledge Graph Embedding Methods

Photo by Pixabay from Pexels
Photo by Pixabay from Pexels

Recap: Vectorization or embeddings (numerical representation of entities and relations of a graph) are necessary to use graphs as an input to the machine learning algorithms.

Since we treat the knowledge graphs differently from others, we need different techniques to learn their numerical representations (or embeddings).

There are several ways to generate knowledge graph embeddings (KGE), which we can broadly divide into 3 parts:

  1. Translation based methods:Here, distance-based functions (in the euclidean space) are used to generate embeddings. We can build a simple algorithm that makes a combination of the head and relation vectors to be equal to the tail vector. It can be represented as h + r ≈ t. **** This algorithm is called TransE. There are other versions of the same algorithm but with few modifications to it. Some examples include TransH, TransR, TransD, TranSparse, and TransM.

  2. Factorization based methods:This **** is based on a tensor factorization idea and the initial algorithm proposed using this technique was RESCAL. A three-way tensor is defined in the form of n x n x m where n is the number of entities and m is the number of relations. The tensor holds value 1 denoting the presence of a relation between the entities and 0 if there is not any. The embeddings are calculated by factorizing this tensor. This is usually computationally expensive for large graphs. The problem is fixed by algorithms like DistMult, HolE, ComplEx, and QuatE which are built on top of RESCAL’s idea.

  3. Neural network-based methods:Neural **** networks are popular in many domains now and it is not surprising that they are employed in finding the Knowledge Graph embeddings. Semantic Matching Energy is an algorithm that defines an energy function used to assign a value to a triple by using neural networks. A Neural Tensor Network uses an energy function but it replaces the standard linear layer of a neural network with a bilinear tensor layer. Convolutional neural networks like ConvE reshapes the numerical representations of entities and relations in form of an "image" and then applies the convolution filters to extract the features, thus learning the final embeddings. We can also find GAN-inspired models like KBGAN and Transformer based models like HittER to calculate the knowledge graph embeddings.

In order to implement these algorithms, we have multiple python libraries such as:

  1. LibKGE
  2. PyKEEN
  3. GraphVite
  4. AmpliGraph

Structure of a KGE algorithm

Photo by Pixabay from Pexels
Photo by Pixabay from Pexels

There are some common underlying ideas to build an algorithm to calculate the knowledge graph embeddings. Some of these ideas are listed below:

  1. Negative Generation: This is a concept of generating the negative or corrupted triples in a knowledge graph. Negative triples are the triples that are not a part of the original graph. These can be generated randomly or by using some strategies like Bernoulli negative sampling.

  2. Scoring Function: It is a function wrapping the triple which spits out a value or a score. We can define that if the score is high then the triple is valid and if the score is low then it is a negative triple. The scoring function is one of the important parts that construct the KGE algorithm.

  3. Loss Function: Since this algorithm is modeled in terms of an optimization problem, we use a loss function during the training process. This loss function uses the scores of positive and negative triples to compute the loss. Our goal is to minimize the loss and we employ an optimizer too. Some examples of loss functions used here are – Cross entropy loss, Pairwise margin-based hinge loss, etc.


What’s next after generating embeddings?

Photo by Ann H from Pexels
Photo by Ann H from Pexels

It was fun learning about the KGE algorithms and applying them to find the embeddings. Now, what’s next? What are the uses of embeddings?

There are some graph downstream tasks that can be applied to knowledge graphs such as:

Knowledge graph completion: This is also known as Link Prediction where our objective is to predict the missing relations or potentially possible relations in a knowledge graph. It can also be termed knowledge graph augmentation. This task boils down to finding an entity that can be best represented as a fact together with a given relation and an entity. Simply put, it is now the task of guessing the missing part in (?, r, t) or (h, r, ?) which can also be called head prediction or tail prediction respectively. We use a rank-based evaluation technique to find the performance of our knowledge graph embeddings.

Triple classification:It is a problem of identifying if a given triple is valid or not i.e if it is a positive triple or a negative triple. The output of this task is just a yes or a no. The scoring function is employed and a threshold is set to separate the positive triples from the negative ones. Basically, this is now a problem of binary classification.

Recommender systems are an area where knowledge graph embeddings can be used. The quality of embeddings is important for the performance and accuracy of the above-mentioned tasks. The result from these tasks tells us if we were able to generate good quality embeddings or not.

Note: If any of the concepts I explained here are a bit heavy on you, then I recommend you check my other articles written on graphs.


If you have reached this part of the post, thank you for reading and also your attention.

Want to Connect?
Reach me at LinkedIn, Twitter, GitHub or just Buy Me A Coffee!

Related Articles