This blog post will summarise the paper " Simplifying Graph Convolutional Networks[1] ", which tries to reverse engineer the Graph Convolutional Networks. So, let us evolve Graph Convolutional Networks backward.
Graphs are pervasive models of structures. They are everywhere, from social networks to the chemistry molecule. Various things can be represented in terms of graphs. However, applying Machine learning to these structures is something that didn’t come directly to us. Everything in Machine learning came from a small simple idea or model which was made complex with time as per the need. Just as an example, initially, we had Perceptron which evolved to Multi-Layer perception, similarly, we had image filters that evolved to non-linear CNNs, and so on. However, Graph Convolutional Networks, referred to as Gcn, were something we derived directly from existing ideas and had a more complex start. Thus, to debunk the GCNs, the paper tries to reverse engineer the GCN and proposes a simplified linear model called Simple Graph Convolution (SGC). SGC as when applied gives comparable performance to GCNs and is faster than even the Fast-GCN.
Basic Notations
Inputs to the Graph convolutional network are:
- Node Labels
- Adjacency matrix
Adjacency matrix: The adjacency matrix A is n x n,matrix where n is the number of nodes, with a(i,j) = 1 if node i is connected to node j else a(i,j) = 0. If edge is weighted then a(i,j) = edge weight.
Diagonal Matrix: Diagonal matrix D is n x n matrix with d(i,i) = sum of ith row of adjacency matrix.
Input features: X is an input feature matrix of size n x c with c as the number of classes.
Let us see how GCNs actually work before reverse engineering it.
Graph Convolutional Network in Action
When we input an image through a CNN layer what happens? It uses a pixel along with the neighboring pixels to give an output for a particular index. The same behaviour is generalized in GCNs. For each layer, we propagate a new feature as:

Further, this step can be represented in matrix form as :


Thus, every next step can be expressed as the sparse matrix multiplication of S with H(i-1)
This is what smoothens are hidden representations locally. How? Every step propagates features from the neighbors. So after the first step node x has information of itself as well as its neighbor. After the second step, it takes information from its neighbors again, but they already had information of its 2 step far neighbors, so these add onto node x. Similarly, the information is gathered at each hop. Now the result is multiplied with a weight matrix W, and a pointwise non-linear function ( ReLU ) is applied to get the next feature matrix.

The results thus obtained can be fed into the softmax layer to get the classification probabilities.

Graph Convolutional Network can be summarized using the following diagram:
![GCN in action | Source[1]](https://towardsdatascience.com/wp-content/uploads/2020/10/16pN7EFe4YPbAkbFHUC1g2w.png)
Simple Graph Convolution: GCN Simplified
Let us recall MLP and CNNs. In CNN’s, every layer has a localized perception that forms our feature maps and as we move deeper, each value has more and more reception than before. Similarly, GCNs obtain information from the neighbors in each hop and thus every node has more information about the whole network after every layer. This is the part which is extracting features from the neighbors.
However, each layer has a non-linearising function but it does not add anything to the receptivity of the network. Thus, on removing the non-linear part we get a simple modal as :


This is the equation for Simple Graph Convolution. We can clearly see that this model is reduced to a simple pre-processing step with straight-forward multi-class regression. How cool! One thing to observe is, the above equation has a learnable parameter Theta while S and X are both the inputs. So SᵏX can be re-written to give X’:


So we can see in the diagram below, how we reverse-engineered the GCNs and realized removing the non-linear layer helps us to understand that the hops are nothing more than a feature engineering step in Machine Learning. Implementation of this part can be found on GitHub. Further, SGC can be summarized as the figure shown below.
![Simple Graph Convolution | Source[1]](https://towardsdatascience.com/wp-content/uploads/2020/10/18F6E-3nyZRaahLD_nuCghQ.png)
Mathematical Primer on Graph Convolution Network
This part will explain the mathematical flow of the GCNs as given Semi-Supervised Classification with Graph Convolutional Networks[3]
- Graph Laplacian:
A matrix is a good way to represent graphs. We define Graph Laplacian and its have normalized version as:


- Eigen Decomposition
Every semidefinite matrix can be factorized in the form :

where U comprises orthonormal eigenvectors and lambda is a diagonal matrix with eigenvalues as its diagonal elements. Now, U have orthonormal vectors, can we define graph Fourier transform with respect to them? Absolutely yes! Read about DFT. We write the Fourier transform of x as:

and Inverse transforms as:

- Graph convolution in terms of Fourier Transform
Now, we know from Signals and System knowledge that we can write the convolution of any two signals as their product in Fourier transform! Using this we get:

Here G is the diagonal matrix with diagonals as the filter coefficients ( The parameters that you learn ) Finally, we use an approximation using kth order polynomial laplacian to give:

- Final Expression
The above approximation is further approximated by using ( Leading to 1st order Chebyshev filter ):

to give

Further, the matrix I + D^(-1/2)AD^(-1/2) is written as:

where A’ = A + I and D’ = D + I.
Simple Graph Convolution and Low-Pass Filtering
The propagation matrix S which we derived is 1st order Chebyshev filter. It can be written in terms of the normalized laplacian as :

Thus the filter coefficient with kth order becomes:

![Feature (red) and filters (blue) spectral coefficients for different propagation matrices on Cora dataset | Results from SGC Paper[1]](https://towardsdatascience.com/wp-content/uploads/2020/10/1RMas3PCONPftmOW4hXTf2g.png)
As shown, the larger the value of K, the filter coefficients explode and over amplify the signals.

This is where the renormalization trick is used in the GCN help. By renormalizing, the filter coefficient becomes polynomial of the renormalized propagation matrix.
Further, the largest eigenvalue of the normalized graph Laplacian becomes smaller after adding self-loops γ > 0
Conclusion
GCNs are the result of the renaissance of the neural network that directly evolved as a complex network to study. Here, GCN is simplified into its possible predecessor called SGC. First, we looked into the working of GCNs then reverse-engineered it as a simplified regression model with feature engineering with added complexity. Further using the mathematical expression of GCNs the network is expressed as a Chebyshev filter! And thus the Network is simplified!
What next?
For more details on implementation and comparison between SGC and GCNs, authors implementation can be found in references[2]. My implementation of SGC on CORA dataset can be found on GitHub.
References:
- Graph Convolutional Networks | Graph Theory 2020
- Wu, Felix, et al. "Simplifying graph convolutional networks." arXiv preprint arXiv:1902.07153 (2019).
- Implementation can be found on GitHub
- Code for Simple Graph Convolution by the author is available on Github
- Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).