Thoughts and Theory

Network Learning — from Network Propagation to Graph Convolution

Two different views of network propagation, a popular method in computational biology, and its connection with graph convolution

Remy Lau

Published in

Towards Data Science

8 min readJun 20, 2021

Illustration of network propagation, image from [2]

You’ve probably heard about graph convolution as it is such a hot topic at the time. Although less well known, network propagation is a dominating method in computational biology for learning on networks. In this post, we’ll dive deep into the theory and intuition behind network propagation, and we’ll also see that network propagation is a special case of graph convolution.

TL;DR

Network propagation is a popular method in computational biology based on the Guilt By Association principle.
Two different views of network propagation: random walk vs. diffusion, with HotNet2 as a specific example.
Network propagation is a special case of graph convolution.

Network propagation in computational biology

Networks arise naturally from many real-world data, such as social networks, transportation networks, biological networks, just to name a few. The network structure encodes rich information about the role of each individual in the network.

In computational biology, it has been shown that biological networks such as Protein-Protein Interactions (PPI), where the nodes are proteins and the edges represent how likely two proteins interact with each other, are very useful in reconstructing biological processes, even unveil disease genes [1,2]. The reconstruction could be done simply by directly looking at whether the neighboring proteins of a target protein are part of the biological process or disease. This process of inferring the membership of a protein via its neighboring proteins is referred to as network propagation. We’ll take a closer look at the precise mathematical formulation in the next section, but now let’s think about why such a simple approach work.

Examples of protein complexes in the yeast PPI, image from [3]

It all boils down to the Guilt By Association (GBA) principle, which states that proteins closely related to each other, either by physical interaction, or other similarity measures like gene co-expression, are likely to participate in the same biological processes or pathways. The GBA principle arises from observing that many protein complexes (such as SAGA/TFIID complex in yeast [3]) are localized in a cohesive network module. Similarly, in the human disease gene network [4], we can see that, for example, disease genes related to ear, nose, and throat disorders or Hematological disorders are localized in network modules. As a side note, in this post, the words protein and gene will be used interchangeably.

The human disease gene network, image from [4]

The mathematical formulation of Network Propagation — two different views

Notations

Given an (undirected) graph G=(V, E, w), with vertex set V of n vertices, edge set E, and weight function w, let A be the corresponding n-by-n dimensional adjacency matrix as follow.

Using the diagonal degree matrix D, whose diagonal entries are the degree of the corresponding nodes, we can either normalize A row-wise or column-wise, resulting in two new matrices, P and W, respectively.

Finally, let p0 be the one-hot encoded label vector where the entries of p0 corresponding to nodes with positive labels are ones, and zero everywhere else.

First view — random walk

We can perform network propagation as random walks on the network. In this case, the key question we are asking is the following.

What is the probability of starting from the target node and ended up in any one of the nodes with a positive label, via 1-hop propagation?

Mathematically, this operation corresponds to the matrix-vector multiplication between P and p0, yielding the prediction score vector y

Let’s take a look at an example. Consider the following subnetwork of genes g1, g2, g3, and g4. Suppose that g2 and g3 are annotated to a disease, which means that these two genes are known to contribute to the disease of interest here. On the other hand, g1 and g4 are not annotated to this disease (caution: this does not mean that they have no effect on the disease, but rather, they are not known to be related to the disease at the moment).

Example of network propagation by random walk, image by author

To determine whether g1 is related to the disease, we can simply devise a 1-hop random walk starting from g1 and see what’s the probability that it lands on a disease gene (g2 or g3). After a simple calculation, we see that the prediction score is 2/3, which is quite high. This makes sense since two out of three neighboring genes of g1 are related to the disease, and according to the GBA principle, g1 is quite likely to be related to this disease.

Second view — diffusion

Another view of network propagation is diffusion through the network. In this case, the key question we are asking is the following.

How much “heat” is being diffused to the target node? Or put it differently, what is the probability of starting from the nodes with positive labels, and ended up in the target node, via 1-hop propagation?

Mathematically, this operation corresponds to the matrix-vector multiplication between P and p0 tilde (a normalized version of p0), yielding the prediction score vector y tilde.

Note: the normalization of p0 ensures mapping from a probability distribution to a probability distribution, i.e., y tilde sums up to 1.

Let’s return to the example above of disease gene prediction via network propagation. This time, we want to perform label propagation as diffusion. As a result, a large portion (5/12) of the total “heat” generated by the two annotated disease genes is collected by g1. Hence g1 is very likely to be related to this disease.

Example of network propagation as diffusion, image by author

Going beyond 1-hop propagation

Illustration of multi-hop propagation, image from [5]

The 1-hop propagation approach is simple and effective. However, when the labeled data is scarce, which is typically the case in computational biology, the 1-hop propagation approach can only compute nontrivial prediction scores for genes that are direct neighbors of disease genes. Given that there are more than 20 thousand genes in the human genome, this clearly leads to sub-optimal predictions. So instead of restricting to the 1-hop neighborhood, we can expand to 2-hop, 3-hop, and even more. The figure here illustrates k-hop propagation from k = 1 up to k = 2.

HotNet2 diffusion

There are many different variants to perform multi-hop diffusion or random walks. We’ll focus on HotNet2 as a specific example here [2]. Similar to the diffusion introduced above, the HotNet2 algorithm iteratively updates the initial “heat” distribution p0 tilde as follows.

where beta, valued from zero to one, is the “restart probability” that brings the “heat” back to its source. There are a few (somewhat related) reasons for including this restart probability. First, the previous definition of diffusion operator gives away all the “heat” the current node has, and thus at step t, all the previous diffusion information is lost. Adding beta effectively retains some amount of heat in each step, and hence at step t, the distribution contains all information from previous steps. Second, the (nonzero) beta parameter guarantees convergence of the heat distribution as t goes to infinity, which in turn gives a closed-form solution to the heat distribution at t=∞

Finally, it has been shown in [1] that this HotNet2 diffusion approach can produce consistently better predictions compared to the 1-hop network propagation defined in the previous section in terms of biological pathway reconstruction, disease gene predictions, etc.

Relation with Graph Convolution

Recall that the graph convolution network follows the layer-wise propagation rule as follows.

GCN propagation, equation (2) in [6]

where H(l) is the hidden feature of l-th layer, W(l) is the learnable parameters, and the leading part inside the nonlinearity sigma (DAD) is the spectral normalized adjacency matrix with self-connections. The self connections serve a similar purpose as restart probability, to retain some information of current iteration.

With the following substitution, we can fully reconstruct label propagation as a special case of graph convolution.

Replace spectral normalized self-connected adjacency matrix with either the row-normalized (P) or column-normalized (W) version
Replace H(l) with p(l)
Replace nonlinearity and W(l) with identity (or simply ignores these transformations)

Note that the first replacement would not change the graph spectrum and hence would still perform the same convolution operation.

And there you have it, network propagation as a special case of graph convolution!

Conclusion

Based on the Guilt By Association principle, network propagation is widely used in computational biology for various tasks like disease gene prediction due to the modularity of the cellular organization. We’ve taken a deep dive into the two views of network propagation and its connection to graph convolution.

Reference

[1] R. Liu, C. A. Mancuso, A. Yannakopoulos, K. A. Johnson, A. Krishnan, Supervised learning is an accurate method for network-based gene classification (2020), Bioinformatics

[2] L. Cowen, T. Ideker, B. J. Raphael, R. Sharan, Netowork propagation: a universal amplifier of genetic associations (2017), Nat Rev Genet

[3] V. Spirin and L. A. Mirny, Protein complexes and functional modules in molecular networks (2003), Proceedings of the National Academy of Sciences

[4] K. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, A. Barabasi, The human disease network (2007), Proceedings of the National Academy of Sciences

[5] W. L. Hamilton, R. Ying, J. Leskovec, Inductive Representation Learning on Large Graphs (2017), arXiv

[6] T. N. Kipf and M. Welling, Semi-Supervised Classification with Graph Convolutional Networks (2016), arXiv