
In our latest article of the series on How to design recommender systems based on graphs? we introduced an emerging category of recommender system algorithm known as knowledge graph-based recommender systems. These systems leverage the semantic structure of knowledge graphs and the powerful capabilities of knowledge graph embedding (KGE) Algorithms to provide users with more precise product recommendations.
What is a Knowledge Graph Embedding?
As mentioned in the previous article, Knowledge graphs (KG) are effective in representing structured data and incorporating data coming from different sources, however the underlying symbolic nature of knowledge graph triples usually makes KGs hard to manipulate for Machine Learning applications.
A Knowledge Graph Embedding (KGE) is a representation of a KG element into a continuous vector space. The objective of learning those embeddings is to ease the manipulation of graph elements (entities, relations) for prediction tasks such as entity classification, link prediction or recommender systems.
Most of the proposed methods rely solely on graph triples with the goal to embed KG entities and relations into continuous vector space. The idea is to preserve the inherent structure of the KG and simplify the use of KG elements. Once KG elements are represented as embeddings, a scoring function is used to measure the plausibility of a triple (e.g. ‘George’, ‘is A’, ‘Person’).

Inspired by Word Embeddings
Embeddings have become popular thanks to the release of Word2vec [1] in 2013. Word2vec efficiently learns word embeddings by training a shallow neural network to predict the context of a word included in a vocabulary, defined by a sliding window of a given amplitude, with the key idea to preserve the semantic of the words. Two different architectures are proposed as shown in the figure below, namely Continuous Bag-of-Words [2] (CBOW) that implements a neural network where the input corresponds to the context words wₜ₋ᵢ ,wₜ₋ᵢ₊₁ …wₜ₊ᵢ₋₁,wₜ₊ᵢ and the output to predict is the target word wₜ and Skip-Gram [1] that implements a two layer neural network where the input corresponds to the target word wₜ and the output to the context words.

Following the same logic, the authors of DeepWalk [3] and node2vec [4] generalized embeddings to graphs by suggesting to make use of neural language models such as Word2vec to build graph embeddings. In DeepWalk, the authors proposed to extract sequences of nodes – which represent entities – in the graph by relying on a random uniform walk in the graph. This sequence of nodes can be seen as a text, and then CBOW or SkipGram is applied to construct embeddings of these nodes.

Node2vec went further by introducing a more sophisticated random walk strategy that can be more easily adapted to a diversity of graph connectivity patterns, outperforming DeepWalk in link prediction and knowledge graph completion tasks.
Leveraging properties of KGs
Considering only the graph structure to encode KG elements is nevertheless not sufficient, hence other methods have emerged to consider also properties and entity types of the graph. In [5], the authors classified the knowledge graph embedding algorithms into two main categories namely translational distance models that are based on a scoring function that measures the plausibility of a triple by measuring distances in the vector space, typically after performing a translation operation and semantic matching models that are based on a similarity-based scoring function that measures the plausibility of a triple by matching the semantics of the latent representations of entities and relations.
Translation Distance Models
For the first category, TransE [6] is often mentioned as the most used translational distance model. TransE represents both entities and relations vectors in the same space Rₘ. Given a triple (s,p,o), the relation is interpreted as a translation vector r so that the embedded entities s (subject) and o (object) can be connected by p with low error, i.e., s + p ≈ o when the triple (s,p,o) holds in the knowledge graph. In other terms, the goal is to minimize the scoring function represented below.

TransH [6] introduces relation-specific hyper-planes, each property p being represented on a hyperplane as wₚ its normal vector. TransR [8] follows the same idea as TransH, but instead of projecting the relations into a hyper-plane, it proposes to create a specific space per relation. We represent in the figure below the embedding space of the different translational distance models presented above.

Semantic Matching Models
On the other hand, semantic matching models exploit similarity-based scoring functions. In [9], the authors proposed RESCAL, a model that associates each entity with a vector to capture its latent semantics. Each relation is represented as a matrix that models pairwise interactions between latent factors. The score of a triple (s, p, o) is defined by a bi-linear scoring function minimized through tensor factorization based on ALS optimization technique.

Other methods that extend RESCAL emerged. NTN [10] (Neural Tensor Network) is a neural network that learns representations using non-linear layers. ER-MLP (Multi layer perceptron), where each relation (as well as entity) is associated with a single vector. More specifically, given a triple (s, p, o), the vector embeddings of s, p, and o are concatenated in the input layer, and mapped to a non-linear hidden layer.
Other methods emerged such as DistMul which simplifies RESCAL by representing relations with diagonal matrices, thus reducing its complexity, ComplEX which __ extends _DistMu_l using complex numbers in place of real numbers. In recent years, numerous methods have emerged with the aim of simplifying the existing literature and improving the accuracy of existing algorithms for knowledge graph (KG) tasks, such as the link prediction task. These methods employ various techniques such as neural network-based models, factorization-based models, and random walk-based models, to name a few.

Conclusion
In conclusion, knowledge graph embedding algorithms have become a powerful tool for representing and reasoning about complex structured data. These algorithms learn low-dimensional embeddings of entities and relations in a knowledge graph, allowing for efficient computation of similarity and inference tasks.
In this blog post, we have discussed various embedding algorithms, including TransE, TransH, TransR, DistMult, ComplEx, and highlighted their strengths and weaknesses. Overall, knowledge graph embedding algorithms have shown great promise in a wide range of applications, including question answering, recommender systems, and natural language processing. As the field continues to evolve, we can expect to see even more powerful and effective embedding algorithms that can handle increasingly large and complex knowledge graphs.
In the next blog post of this series, we will present some concrete recommendation system use cases where the use of KGE is beneficial to improve the recommendation accuracy.
References:
[1] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems – Volume 2, NIPS’13, page 3111–3119, Red Hook, NY, USA, 2013. Curran Associates Inc.
[2] Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In Yoshua Bengio and Yann LeCun, editors, 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Workshop Track Proceedings, 2013.
[3] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710, 2014.
[4] Aditya Grover and Jure Leskovec. Node2vec: Scalable feature learning for networks. In 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855–864, New York, NY, USA, 2016. ACM.
[5] Nan Wang, Hongning Wang, Yiling Jia, and Yue Yin. Explainable recommendation via multi-task learning in opinionated text data. In The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’18, page 165–174, New York, NY, USA, 2018. Association for Computing Machinery.
[6] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26, Lake Tahoe, Nevada, United States, 2013. Curran Associates, Inc.
[7] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI’14, pages 1112–1119, Québec City, Québec, Canada, 2014. AAAI Press.
[8] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the TwentyNinth AAAI Conference on Artificial Intelligence, AAAI’15, pages 2181–2187, Austin, Texas, 2015. AAAI Press.
[9] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, pages 809–816, Madison, WI, USA, 2011. Omnipress.
[10] Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. Reasoning with neural tensor networks for knowledge base completion. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26, Lake Tahoe, Nevada, United States, 2013. Curran Associates, Inc