I have reviewed some of the existing libraries to do Graph Convolutional Neural Networks (GCNN’s) and, although in general they are very good, I always return to DGL because it has excellent documentation and many examples, among other things [1]. Here, I want to share my review of a classic example in the study of GCNN’s, the CORA dataset using of course DGL. The CORA dataset is citation network where nodes are articles and edges are citations between them. The gif below helps to get an intuition of how are the connections at a glance.

There are 2708 nodes with 7 classes and each node has associated a feature vector with 1433 features [2]. Here we are going to use this dataset to make a semi-supervised classification task to predict a node class (one of seven) knowing a small number of nodes. In this case the number of known nodes is 140 as is implemented in DGL, but a different number could be used as the whole information is available. Before starting we must have the DGL library installed which currently is V0.7. Then we proceed to import some modules in the usual way,
Next, we must load the CORA dataset in the following form,
In row 4 we set g as the graph object and then we retrieve some tensors. The features tensor has the 1433 features for the 2708 nodes and the labels tensor has entries for each node assigning a number from 0 to 6 as label. The other two tensors, _trainmask and _testmask just __ got _Tru_e or _Fals_e if the node is for train or test respectively. In the following table, we can see the values for this graph in DGL:
Now we must define the graph convolutions, but before that, it is important to briefly review the formulas. We recall that in principle, it is needed the adjacency matrix A of the graph, but it is transformed according to these equations, where I is the identity matrix [3]:

We will define the graph convolutions in a python class according to this equations:

here x1 and x2 are the first and second convolution respectively. In DGL, this can be easily done by calling the GraphConv module which in this case does the operations between parentesis (AXW +b) as by default activation functions are set to None:
Notice that in the forward method we define x1 and x2 following the equations above. The drawing below shows how are the sizes of the matrices involved. First, in Conv1, AX is the matrix multiplication of the adjacency matrix (A) with the features matrix (X) giving a matrix of 2708×1433. The weights matrix _W_º thus has 1433 rows and 8*16=128 columns (This number is arbitrary, but works well). We end with a matrix x1 of 2708×128. In the second place, we feed the second convolution with x1 following the same process but this time we end with just 7 features (same number as the total classes) thus giving a matrix x2 of 2708×7:

Finally we have to make the train part. Here we do up to 200 epochs using Adam optimizer and log_softmax which is good for classification tasks. In order to retrieve the values for loss, accuracy and feature predictions, we introduce _losslist, _acclist and _alllogits in rows 4 to 6.
Running this code we get after 200 epochs an accuracy of around 0.78 as can be seen in the plot below:

Finally, if it is desired to see the predicted features and labels, we can do the following code where the result for the last epoch (199 here) is stored in a dataframe and applying an argmax operation over the features, we get the index of the higher value (a number between 0 and 6) which gives the class:
below is the output for the first 5 rows of the dataframe where the seven columns are the feature values learned and the last column is the result for the class:
And that’s all. Now you are ready to try some other examples or trying other GCN variants like SAGEConv, Gated Graph Convolution layers or Graph attention layers which are included in the great DGL library.
Try the code in this COLAB notebook:
I hope you have enjoyed the lecture, if so please clap 50 times!!
Follow me on twitter → @napoles3d
References:
[2] https://ojs.aaai.org//index.php/aimagazine/article/view/2157