The world’s leading publication for data science, AI, and ML professionals.

Graph Neural Networks: A learning journey since 2008- Part 1

Graph Neural Networks are gaining more and more success, but what they really are? How do they work? Let's see together Graphs in these…

Graph Neural Networks are gaining more and more success, but what they really are? How do they work? Let’s see together Graphs in these stories! Today: Scarselli seminal paper on Graph Neural Network

Picture by Dominik Schröder on Unsplash
Picture by Dominik Schröder on Unsplash

Join Medium with my referral link – Stefano Bosisio

Data are often easy to be visualised and interpreted as graphs. Graphs help in finding hidden answers among complicated patterns, besides shedding light on underlying relationships among data points. Thus, it is not surprising that graph applications are ubiquitous, ranging from social media analysis[1–5] to neuroscience [6,7], page ranking [8–10], shortest path theory [11–14] and chemistry [15–19].

Since 2006 graph theory has come in close contact with machine Learning with the new concept of Graph Neural Networks applications. The very first proposal was published in 2006 by Scarselli and Gori [20] and subsequently generalized in 2008 [21] with the paper "The Graph Neural Network Model". Here, the authors laid the mathematical foundations for the modern Graph Neural Network. Since then the literature has seen spikes in graph ML works[22–28], which made the graph world evolve more and more, defining more precisely what are the key elements of these mathematical structures and how to link them with more refined machine learning algorithms.

I think that every paper about graph theory and ML is an adventure, where authors are establishing their nomenclature and mathematical development. Thus, I wanted to start this series of tutorials about graphs, to deliver an easy explanation, with mathematical rigour, and computational examples in Python. Today we will start this journey with the main paper indeed, "The Graph Neural Network Model", by Scarselli. Let’s get it started!

The Graph Neural Network Model

Definitions

The graph neural network (GNN) originates from two ML techniques: recursive neural network (RNN) [30–32] and Markov chains [33–35], where the underlying idea is to encode data using a graph, exploiting relationships among nodes. In particular, the RNN approach is graph-focused, where a user’s objective is to label a given graph, after training with labelled graphs. The Markov chain approach is node-focused, where each node is labelled and a user is looking for node-level predictions. The GNN model encapsulates the learning from these previous models and is suitable for both graph and nodes-focused applications.

Additionally, GNN generalizes graph-based calculations, to process both symmetric and asymmetric graphs, fig.1. A graph is defined symmetric if its properties among nodes are unaltered after modifying nodes’ positions – namely you always obtain the same graph. On the contrary, in an asymmetric graph, the order is very important, as different graphs can be obtained by shuffling nodes’ positions. In the case of asymmetricity, a pair nodes defines an edge which is called arc and the graph is referred as directed – namely, it has a specific direction.

Fig.1: Example of A) symmetric graph and B) asymmetric graph. In the former case if you change the order of the nodes the graph does not change, while in an asymmetric case the graph has a specific order in the nodes. Image by the author
Fig.1: Example of A) symmetric graph and B) asymmetric graph. In the former case if you change the order of the nodes the graph does not change, while in an asymmetric case the graph has a specific order in the nodes. Image by the author

Computational insights

As regards computational graphs, each node is enriched with some attributes, collected into a label. A label is a vector, with a predefined dimensionality, that encodes features describing each node (e.g. average colour for a pixel in an image, the shape of an object etc) and the relationship among nodes – edges (e.g. distance between a point of interest in image recognition etc.). Given the underlying feature conditions, the graph structure is remarkably different from a simple neural network. Indeed, each node could know what is happening in its neighbourhood and could see what is the current state (or features) of an adjacent node (fig.2)

Fig.2: let's examine a piece of graph network. All the nodes have their own labels (l1, l2, l3, l4) along with a state vector x (x1, x2, x3, x4) and edges information (l12, l13, l14, l15). Each node's state, for example node 1 x1, is a function of the states of all the adjacent nodes, thanks to a transition function fw. Image by the author
Fig.2: let’s examine a piece of graph network. All the nodes have their own labels (l1, l2, l3, l4) along with a state vector x (x1, x2, x3, x4) and edges information (l12, l13, l14, l15). Each node’s state, for example node 1 x1, is a function of the states of all the adjacent nodes, thanks to a transition function fw. Image by the author

It is thus possible to define a graph as a mathematical object. Each node has a defined state _x_ₙ which contains nodes’ features as well as neighbour’s features. Additionally, each node has an associated local transition function f𝓌 that models the dependency of features x with respect to the context of node n __ and a local output function g𝓌, which describes how the output for each node is produced.

Eq. 1: Positional form for a graph network. The state x_n of a node n depends on the node label ln, the edges labels l_co[n], the neighbours state x_ne[n] and the neighbour nodes labels l_ne[n]. The output on depends on a local output function gw, which is computed through the node n's state x_n and label l_n.
Eq. 1: Positional form for a graph network. The state x_n of a node n depends on the node label ln, the edges labels l_co[n], the neighbours state x_ne[n] and the neighbour nodes labels l_ne[n]. The output on depends on a local output function gw, which is computed through the node n’s state x_n and label l_n.

So far so good, but there are problems when dealing with _f𝓌. T_here might be a variable number of arguments depending on the neighbourhood size or when the set of neighbours is not ordered. Therefore f_𝓌. s_hould be in_variant t_o a permutation of the nodes in the neighbourhood. To satisfy this mathematical and technical constraint it is possible to generalize f𝓌 as a function h𝓌. This last transition function is invariant to nodes’ positions and the number of nodes in the graph and it depends on the given label of the arc between nodes n and u__

Eq. 2: nonpositional form of equation 1. In this case we can define the state x_n of a node n through a general transition function h_w which depends on node label ln, the arc labels l(n,u) between node n and node u, the state of node u x_u and the label of node u l_u.
Eq. 2: nonpositional form of equation 1. In this case we can define the state x_n of a node n through a general transition function h_w which depends on node label ln, the arc labels l(n,u) between node n and node u, the state of node u x_u and the label of node u l_u.

A requirement: Banach’s fixed point theorem

Once the main functions and structure of a graph have been defined, a computation of the state _x_ₙ is necessary – we need to define a. The core point of Scarselli’s treatment for GNN computation is Banach Fixed Point Theorem. Banach fixed-point theorem is also called contraction mapping theorem. This theorem establishes the existence and uniqueness of a fixed point in a defined metric space for a contraction map function. What does this mean?

  • A fixed-point in mathematics is a solution for a given function F, namely when the function is an identity
  • Then, consider the concept of distance, namely how two points a and b are far apart from each other. Given a set of numbers X, a distance can be computed among these points. The distance measures define a new space, called metric space, which is represented by the set X and the result from the distance function (X, d)
  • Now, in this metric space, we may have a mathematical function F that can be applied to the points a and b to retrieve another measure. If the distance between the resulting points F(a) and F(b) is less than the real distance between a and b, F is a contraction map.
  • If F is a contraction map, then Banach proved that for such a function there exists a fixed-point solution

The consequence of such a theorem is that it is possible to compute the current nodes’ state with an iterative scheme. We could say that basically the nodes states are updated each time in order to find the right label.

At this point, we are almost ready to get to the production stage. However, what functions should we have for _f𝓌/h𝓌 an_d g𝓌 to satisfy Banach’s fixed-point theorem? Surprise, surprise, here is the second important part of GNN: these functions can be simple multi-layer perceptrons (MLP). Indeed, MLP can exploit the universal approximation theory, so they do satisfy Banach requirements. Initially, each unit has initiated random states. The f𝓌/_h𝓌funct_ions will update those states till convergence is achieved, namely when a fixed-point solution is found. Then, it is possible to proceed with the final MLP calculation with g𝓌, w_hic_h, in turn, will return the output prediction for each node or for the graph itself.

Now that we saw in theory how GNN works, stay tuned for the next post, where we will implement computationally a GNN!

Fig.3: the final view on the graph neural network (GNN). The original graph can be seen as a combination of steps through time, from time T to time T+steps, where each function receive a combination of inputs. The fina unfolded graph each layer corresponds to a time instant and has a copy of all the units of the previous steps.
Fig.3: the final view on the graph neural network (GNN). The original graph can be seen as a combination of steps through time, from time T to time T+steps, where each function receive a combination of inputs. The fina unfolded graph each layer corresponds to a time instant and has a copy of all the units of the previous steps.

Please, feel free to send me an email for questions or comments at: [email protected] or directly here in Medium.

BIBLIOGRAPHY

  1. Aisopos, Fotis, George Papadakis, and Theodora Varvarigou. "Sentiment analysis of social media content using n-gram graphs." Proceedings of the 3rd ACM SIGMM international workshop on Social media. 2011
  2. Campbell, William M., Charlie K. Dagli, and Clifford J. Weinstein. "Social network analysis with content and graphs." Lincoln Laboratory Journal 20.1 (2013): 61–81.
  3. Cogan, Peter, et al. "Reconstruction and analysis of twitter conversation graphs." Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research. 2012.
  4. Amati, Giambattista, et al. "TWITTER TEMPORAL EVOLUTION ANALYSIS: COMPARING EVENT AND TOPIC DRIVEN RETWEET GRAPHS." IADIS International Journal on Computer Science & Information Systems 11.2 (2016).
  5. Rossi, Emanuele, et al. "Temporal graph networks for deep learning on dynamic graphs." arXiv preprint arXiv:2006.10637(2020).
  6. Bassett, Danielle S., Perry Zurn, and Joshua I. Gold. "On the nature and use of models in network neuroscience." Nature Reviews Neuroscience 19.9 (2018): 566–578.
  7. Sporns, Olaf. "Graph theory methods: applications in brain networks." Dialogues in clinical neuroscience 20.2 (2018): 111.
  8. Abedin, Babak, and Babak Sohrabi. "Graph theory application and web page ranking for website link structure improvement." Behaviour & Information Technology 28.1 (2009): 63–72.
  9. Meghabghab, George. "Google’s Web page ranking applied to different topological Web graph structures." _Journal of the American Society for Information Science and Technology_52.9 (2001): 736–747.
  10. Scarselli, Franco, et al. "Graph neural networks for ranking web pages." The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). IEEE, 2005.
  11. Goldberg, Andrew V., and Chris Harrelson. "Computing the shortest path: A search meets graph theory." SODA. Vol. 5. 2005.
  12. Gallo, Giorgio, and Stefano Pallottino. "Shortest path algorithms." Annals of operations research 13.1 (1988): 1–79.
  13. Borgwardt, Karsten M., and Hans-Peter Kriegel. "Shortest-path kernels on graphs." Fifth IEEE international conference on data mining (ICDM’05). IEEE, 2005.
  14. Hadlock, F. O. "A shortest path algorithm for grid graphs." Networks 7.4 (1977): 323–334.
  15. Balaban, Alexandru T. "Applications of graph theory in chemistry." Journal of chemical information and computer sciences 25.3 (1985): 334–343.
  16. Vergniory, M. G., et al. "Graph theory data for topological quantum chemistry." Physical Review E 96.2 (2017): 023310.
  17. Bradlyn, Barry, et al. "Band connectivity for topological quantum chemistry: Band structures as a graph theory problem." Physical Review B 97.3 (2018): 035138.
  18. Coley, Connor W., et al. "A graph-convolutional neural network model for the prediction of chemical reactivity." Chemical science 10.2 (2019): 370–377.
  19. Tang, Bowen, et al. "A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility." Journal of cheminformatics 12.1 (2020): 1–9.
  20. Scarselli, Franco, et al. "Graph neural networks for ranking web pages." The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). IEEE, 2005.
  21. Scarselli, Franco, et al. "The graph neural network model." IEEE transactions on neural networks 20.1 (2008): 61–80.
  22. Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of social representations." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.
  23. Li, Yujia, et al. "Gated graph sequence neural networks." arXiv preprint arXiv:1511.05493 (2015).
  24. Gilmer, Justin, et al. "Neural message passing for quantum chemistry." International conference on Machine Learning. PMLR, 2017.
  25. Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).
  26. Wu, Felix, et al. "Simplifying graph convolutional networks." International conference on machine learning. PMLR, 2019.
  27. Hamilton, William L., Rex Ying, and Jure Leskovec. "Inductive representation learning on large graphs." Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017.
  28. Battaglia, Peter W., et al. "Relational inductive biases, deep learning, and graph networks." arXiv preprint arXiv:1806.01261 (2018).
  29. Zachary, Wayne W. "An information flow model for conflict and fission in small groups." _Journal of anthropological research_33.4 (1977): 452–473.
  30. Frasconi, Paolo, Marco Gori, and Alessandro Sperduti. "A general framework for adaptive processing of data structures." IEEE transactions on Neural Networks 9.5 (1998): 768–786.
  31. Sperduti, Alessandro, and Antonina Starita. "Supervised neural networks for the classification of structures." IEEE Transactions on Neural Networks 8.3 (1997): 714–735.
  32. Hagenbuchner, Markus, Alessandro Sperduti, and Ah Chung Tsoi. "A self-organizing map for adaptive processing of structured data." IEEE transactions on Neural Networks 14.3 (2003): 491–505.
  33. Brin, Sergey, and Lawrence Page. "The anatomy of a large-scale hypertextual web search engine." Computer networks and ISDN systems 30.1–7 (1998): 107–117.
  34. Kleinberg, Jon M. "Authoritative sources in a hyperlinked environment." SODA. Vol. 98. 1998.
  35. Tsoi, Ah Chung, et al. "Adaptive ranking of web pages." Proceedings of the 12th international conference on World Wide Web. 2003.

Related Articles