What Can You Do With GNNs

Manipulation, Utility and Advantages of Graph Neural Networks

Published in

Towards Data Science

7 min readAug 24, 2021

Graph neural networks (GNN) are gaining popularity due to the ubiquitous nature of the Graph Data structure. Graphs enable us to model many different problems in science in fields such as (but not limited to) biology, sociology, ecology, vision, education, economics, etc. Moreover, graph representations enable us to handle unstructured data in massive scales.

In this article, I will show how one might use a simple GNN in tasks such as classification, clustering and visualization. I will be using a GCN (Graph Convolution Network) for the running example. This should provide one with great intuition to extend the ideology to their own domain.

Formal Representation of a GNN

Any GNN can be represented as a layer containing two mathematical operators, aggregation function and combination function. This is best understood using the MPNN (message passing neural network) framework.

Aggregation

If we consider an example graph as above, the aggregator function specializes in combining the neighbourhood information. More formally, aggregation can be represented as;

Equation by Author Reference(https://arxiv.org/pdf/1810.00826.pdf)

In simple terms, the neighborhood aggregation of node v in k-th GNN layer is expressed using activation of neighboring node u, hᵤ of layer k-1. Neighbors of v are expressed as N(v). In the first layer k-1=0, which fallback to the node features. In first layer we simply aggregate neighbors’ initial features. In case of GCN the aggregtor is simply the degree normalized means (each message is normalized by sequare root of product of degrees of v and u). One can think of various aggregators such as max, mean, min, etc as long as the operation is order invariant (result not altered by shuffling).

Combination

Combination of neighbor information with the node itself is formally represented in the below equation.

Here different operations such as concatenation, summing up or element wise pooling operations can be used. Different GNN architectures rely on different functions. GCN uses the average, which we will discuss next.

In the above Graph figure, we can aggregate features of node 1 to 6 by X1/(sqrt(7×2)) X1 is features of node 1 and 7, 2 are degrees of node 6 and 1 respectively. For each node, we can do this. Intuitively, we can think of this as each node passing its message to others by averaging it over its out-degree, and them receiving others’ messages by averaging over their in-degrees. Hence the name MPNN.

For a graph G(V, E) with adjacency matrix A and degree matrix D having features X, this can be easily achieved by D^(-1/2)XAD^(-1/2). Usually, Adjacency matrix is added with I (identity matrix) to incorporate node’s own features. In such cases A is denoted as Â (A-hat) and D is replaced with D-hat where D-hat corresponds to A-hat. At this point, we have performed both aggregation and combination in few matrix operations. The resulting matrix is fed to a trainable differentiable function ɸ which is usually an MLP (Multi-layer perceptron) i.e. a neural network.

Stacking up layers

We discussed what happens in a GNN layer, now imaging we stack up few such layers. This means we make more multiplications on the adjacency matrix. If you’re familiar with random-walks, D^(-1)A is called the transition matrix. Which is used in power iterations till convergence to find random walk probabilities from a given node to another. Intuitively, more layers of GNN we add, more hops the aggregation expands. Or in other words, after one Layer, we have node’s and its neighbors’ information. When we do it once more, neighbors (who have their neighbors) are aggregated again. Hence 2 hops, and so on.

Example Time!

PyTorch Geometric Framework

GNNs can be easily implemented using the pytorch geometric library. There you can find many implementations of GNNs and a messaging passing class to play around with your own custom implementations. Check it out in the following link.

PyTorch Geometric Documentation - pytorch_geometric 1.7.2 documentation

It consists of various methods for deep learning on graphs and other irregular structures, also known as geometric deep…

pytorch-geometric.readthedocs.io

Cora Dataset

We will use the popular Cora dataset which consists of scientific publications under 7 classes. It is connected via citations which represents the edges between nodes, which are research papers.

The visualization of graph using networkx yields the above image. We can see few colors have flocked together, but we are from any kind of satisfaction. So let’s reduce dimension on features and explore a bit more.

UMAP on Features

One easy way to interpret data is seeing what is there and how they are placed. UMAP is a very helpful manifold learning tool which enables us to do this. Let’s visualize.

We can see some localization of classes but it is not perfect. Simplified code for the above operation is as follows (Complete code at end of article);

# essential imports that will be needed throughout the blog
import torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv
import matplotlib.pyplot as plt
import seaborn as sns
import umap
import networkx as nx
import numpy as npdataset = 'Cora'
path = "./"
dataset = Planetoid(path, dataset, transform=T.NormalizeFeatures())
data = dataset[0]embd = umap.UMAP().fit_transform(data.x.numpy())
plt.figure(figsize=(10, 10))
sns.scatterplot(x=embd.T[0], y=embd.T[1], hue=data.y.numpy(), palette=palette)
plt.legend(bbox_to_anchor=(1,1), loc='upper left')

We are definitely not happy with what we see, so let’s try a GCN and see the visualization. My network is as follows (which I modified from pytorch geometric github examples);

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = GCNConv(dataset.num_features, 16, cached=True)
        self.conv2 = GCNConv(16, 16, cached=True)
        
        self.fc1 = torch.nn.Linear(16, dataset.num_classes)    def forward(self):
        x, edge_index, edge_weight = data.x, data.edge_index,
                                          data.edge_attr
        x = self.conv1(x, edge_index, edge_weight)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index, edge_weight)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.fc1(x)
        
        return F.log_softmax(x, dim=1)

We can train this using the following code;

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model, data = Net().to(device), data.to(device)
optimizer = torch.optim.Adam([
    dict(params=model.conv1.parameters(), weight_decay=5e-4),
    dict(params=model.fc1.parameters(), weight_decay=5e-4),
    dict(params=model.conv2.parameters(), weight_decay=0)
], lr=0.01)def train():
    model.train()
    optimizer.zero_grad()
    F.nll_loss(model()[data.train_mask],
                   data.y[data.train_mask]).backward()
    optimizer.step()

Note that L2 regularizer is missing in Conv layer 2, this is something authors of GCN has decided empirically (https://github.com/tkipf/gcn/issues/108).

When visualized, the output looks like below;

We can see that there is a very clear separation of different classes. Here, the training finished with 0.7800 test accuracy. Can we manipulate this a bit more? Let’s see.

Embedding losses

Neural networks can be though of as continuous differentiable functions. Classification essentially learns decision boundaries for predictions. Read more about decision boundaies here;

Logistic Regression and Decision Boundary

Understanding logistic regression and its utility in classification

towardsdatascience.com

In summary, if we force network to have better boundaries, we can have better visualizations. This means, we should be able to see the classes separately. This is particularly useful if we visualize of cluster data. One simple thing we can do is;

Ask GNN to embed similar classes closer
Ask GNN to embed different classes further

Note that embeddings are the final layer output or the classification output of the network. We can use dot-product as a measure of distance in this case. We prepare our data point pairs as follows for this loss;

y_neg_pairs = []
y_pos_pairs = []data_idx = np.arange(len(data.x))
for idx1, y1 in enumerate(data.y[data.train_mask].cpu().numpy()):
    for idx2, y2 in enumerate(data.y[data.train_mask].cpu().numpy()):
        if idx1 > idx2 and y1!=y2:
            y_neg_pairs.append([idx1, idx2])
        if idx1 > idx2 and y1==y2:
            y_pos_pairs.append([idx1, idx2])y_neg_pairs = np.array(y_neg_pairs)
y_pos_pairs = np.array(y_pos_pairs)

Our modified loss function is as follows;

model_out = model()[data.train_mask]
    y_true = data.y[data.train_mask]
    nllloss = F.nll_loss(model_out, y_true)    #Negative loss
    disloss_neg = F.logsigmoid(-1 * (model_out[y_neg_pairs.T[0]]*model_out[y_neg_pairs.T[1]])).sum(-1).mean()
    
    #Positive loss
    disloss_pos = F.logsigmoid((model_out[y_pos_pairs.T[0]]*model_out[y_pos_pairs.T[1]])).sum(-1).mean()
    
    loss = 10 * nllloss - disloss_neg - disloss_pos

Note that we manipulate polarity of dot-product and pass it through logsigmoid to obtain the dot-product based loss. If you are interested this can be studied under GraphSAGE paper (https://arxiv.org/abs/1706.02216).

Now our training finishes with a loss 0.7720, which is slightly worse than before. Let’s visualize and see the output of the GNN with UMAP.

We can see clusters are now better and noise is slightly lesser. Despite our lesser accuracy we have a better separation of clusters. Actually the lesser test loss is due to the indefinite nature of the clusters. We can see some points are confidently located in wrong colored clusters. This is essentially due to the nature of data.

Extending the Idea to Unsupervised Clustering

How do we extend this idea when we do not have labels, but only features and the graph. This has been discussed in GraphSAGE. The simple idea is to embed closer nodes closer and vice versa using graph topology. In place of our positive and negative pairs, we can have directly connected pairs and random pairs as positive and negative pairs respectively. This has shown good results in various domains, which is a topic for another day! 😊

I hope you enjoyed this article and I believe this will be useful for your research too!

As every other article, this comes with the notebook!