The world’s leading publication for data science, AI, and ML professionals.

Reverse Engineering Graph Convolutional Networks

This blog post will summarise the paper "Simplifying Graph Convolutional Networks", which tries to reverse engineer the GCNs. Let us…

Let us reverse engineer the GCN's. Photo by ThisisEngineering RAEng on Unsplash
Let us reverse engineer the GCN’s. Photo by ThisisEngineering RAEng on Unsplash

This blog post will summarise the paper " Simplifying Graph Convolutional Networks[1] ", which tries to reverse engineer the Graph Convolutional Networks. So, let us evolve Graph Convolutional Networks backward.

Graphs are pervasive models of structures. They are everywhere, from social networks to the chemistry molecule. Various things can be represented in terms of graphs. However, applying Machine learning to these structures is something that didn’t come directly to us. Everything in Machine learning came from a small simple idea or model which was made complex with time as per the need. Just as an example, initially, we had Perceptron which evolved to Multi-Layer perception, similarly, we had image filters that evolved to non-linear CNNs, and so on. However, Graph Convolutional Networks, referred to as Gcn, were something we derived directly from existing ideas and had a more complex start. Thus, to debunk the GCNs, the paper tries to reverse engineer the GCN and proposes a simplified linear model called Simple Graph Convolution (SGC). SGC as when applied gives comparable performance to GCNs and is faster than even the Fast-GCN.

Basic Notations

Inputs to the Graph convolutional network are:

  1. Node Labels
  2. Adjacency matrix

Adjacency matrix: The adjacency matrix A is n x n,matrix where n is the number of nodes, with a(i,j) = 1 if node i is connected to node j else a(i,j) = 0. If edge is weighted then a(i,j) = edge weight.

Diagonal Matrix: Diagonal matrix D is n x n matrix with d(i,i) = sum of ith row of adjacency matrix.

Input features: X is an input feature matrix of size n x c with c as the number of classes.

Let us see how GCNs actually work before reverse engineering it.

Graph Convolutional Network in Action

When we input an image through a CNN layer what happens? It uses a pixel along with the neighboring pixels to give an output for a particular index. The same behaviour is generalized in GCNs. For each layer, we propagate a new feature as:

Forward propagation step
Forward propagation step

Further, this step can be represented in matrix form as :

the kth output is obtained by multiplying S with the previous output
the kth output is obtained by multiplying S with the previous output
S: Normalized adjacency matrix
S: Normalized adjacency matrix

Thus, every next step can be expressed as the sparse matrix multiplication of S with H(i-1)

This is what smoothens are hidden representations locally. How? Every step propagates features from the neighbors. So after the first step node x has information of itself as well as its neighbor. After the second step, it takes information from its neighbors again, but they already had information of its 2 step far neighbors, so these add onto node x. Similarly, the information is gathered at each hop. Now the result is multiplied with a weight matrix W, and a pointwise non-linear function ( ReLU ) is applied to get the next feature matrix.

Non-Linear activation function
Non-Linear activation function

The results thus obtained can be fed into the softmax layer to get the classification probabilities.

Classification using GCNs (Eq 1)
Classification using GCNs (Eq 1)

Graph Convolutional Network can be summarized using the following diagram:

GCN in action | Source[1]
GCN in action | Source[1]

Simple Graph Convolution: GCN Simplified

Let us recall MLP and CNNs. In CNN’s, every layer has a localized perception that forms our feature maps and as we move deeper, each value has more and more reception than before. Similarly, GCNs obtain information from the neighbors in each hop and thus every node has more information about the whole network after every layer. This is the part which is extracting features from the neighbors.

However, each layer has a non-linearising function but it does not add anything to the receptivity of the network. Thus, on removing the non-linear part we get a simple modal as :

Linear GCN called SGC, after removing the non-linear function from GCN
Linear GCN called SGC, after removing the non-linear function from GCN
Simplified SGC equation (equivalent to Linearized Eq 1)
Simplified SGC equation (equivalent to Linearized Eq 1)

This is the equation for Simple Graph Convolution. We can clearly see that this model is reduced to a simple pre-processing step with straight-forward multi-class regression. How cool! One thing to observe is, the above equation has a learnable parameter Theta while S and X are both the inputs. So SX can be re-written to give X’:

X (the output) is equivalent to feature engineered input | Image by the author
X (the output) is equivalent to feature engineered input | Image by the author
The resulting classifier equation
The resulting classifier equation

So we can see in the diagram below, how we reverse-engineered the GCNs and realized removing the non-linear layer helps us to understand that the hops are nothing more than a feature engineering step in Machine Learning. Implementation of this part can be found on GitHub. Further, SGC can be summarized as the figure shown below.

Simple Graph Convolution | Source[1]
Simple Graph Convolution | Source[1]

Mathematical Primer on Graph Convolution Network

This part will explain the mathematical flow of the GCNs as given Semi-Supervised Classification with Graph Convolutional Networks[3]

  • Graph Laplacian:

A matrix is a good way to represent graphs. We define Graph Laplacian and its have normalized version as:

Graph Laplacian (D = diagonal matrix and A = adjacency matrix)
Graph Laplacian (D = diagonal matrix and A = adjacency matrix)
Normalized form
Normalized form
  • Eigen Decomposition

Every semidefinite matrix can be factorized in the form :

Eigen Decomposition
Eigen Decomposition

where U comprises orthonormal eigenvectors and lambda is a diagonal matrix with eigenvalues as its diagonal elements. Now, U have orthonormal vectors, can we define graph Fourier transform with respect to them? Absolutely yes! Read about DFT. We write the Fourier transform of x as:

Fourier Transform
Fourier Transform

and Inverse transforms as:

Inverse Transform
Inverse Transform
  • Graph convolution in terms of Fourier Transform

Now, we know from Signals and System knowledge that we can write the convolution of any two signals as their product in Fourier transform! Using this we get:

Convolution of the filter with the input
Convolution of the filter with the input

Here G is the diagonal matrix with diagonals as the filter coefficients ( The parameters that you learn ) Finally, we use an approximation using kth order polynomial laplacian to give:

G is approximated as the value inside the braces!!
G is approximated as the value inside the braces!!
  • Final Expression

The above approximation is further approximated by using ( Leading to 1st order Chebyshev filter ):

to give

Basic GCN convolution operation
Basic GCN convolution operation

Further, the matrix I + D^(-1/2)AD^(-1/2) is written as:

where A’ = A + I and D’ = D + I.

Simple Graph Convolution and Low-Pass Filtering

The propagation matrix S which we derived is 1st order Chebyshev filter. It can be written in terms of the normalized laplacian as :

Thus the filter coefficient with kth order becomes:

Feature (red) and filters (blue) spectral coefficients for different propagation matrices on Cora dataset | Results from SGC Paper[1]
Feature (red) and filters (blue) spectral coefficients for different propagation matrices on Cora dataset | Results from SGC Paper[1]

As shown, the larger the value of K, the filter coefficients explode and over amplify the signals.

Normalized S gives normalized Laplacian which further affects the Eigenvalues
Normalized S gives normalized Laplacian which further affects the Eigenvalues

This is where the renormalization trick is used in the GCN help. By renormalizing, the filter coefficient becomes polynomial of the renormalized propagation matrix.

Further, the largest eigenvalue of the normalized graph Laplacian becomes smaller after adding self-loops γ > 0

Conclusion

GCNs are the result of the renaissance of the neural network that directly evolved as a complex network to study. Here, GCN is simplified into its possible predecessor called SGC. First, we looked into the working of GCNs then reverse-engineered it as a simplified regression model with feature engineering with added complexity. Further using the mathematical expression of GCNs the network is expressed as a Chebyshev filter! And thus the Network is simplified!

What next?

For more details on implementation and comparison between SGC and GCNs, authors implementation can be found in references[2]. My implementation of SGC on CORA dataset can be found on GitHub.


References:

  1. Graph Convolutional Networks | Graph Theory 2020
  2. Wu, Felix, et al. "Simplifying graph convolutional networks." arXiv preprint arXiv:1902.07153 (2019).
  3. Implementation can be found on GitHub
  4. Code for Simple Graph Convolution by the author is available on Github
  5. Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).

Related Articles