The world’s leading publication for data science, AI, and ML professionals.

Graph Neural Networks: a learning journey since 2008 – Python & Deep Walk

The fourth part of this series. Today, the practical implementation of DeepWalk 🐍 and a look at Facebook Large Page Dataset 👍

Hands-on Tutorials

Image by Francesca Hotchin on Unsplash
Image by Francesca Hotchin on Unsplash

Join Medium with my referral link – Stefano Bosisio

Welcome back to the fourth part of our journey in the Graph Machine Learning world. In the previous post, we saw how DeepWalk algorithm works and its mathematical background. Today we’ll dive into the Python implementation of the algorithm, starting from a simple example (the Karate club) with a simple DeepWalk version in Numpy and then we’ll jump to a more interesting scenario, with Facebook Large Page Dataset.

Here the list of the previous episodes:

Concatenating all DeepWalk mathematical steps

Generate a corpus of "sequences"

Now that we know how DeepWalk works mathematically, let’s see together step by step what is going to happen for the Karate club, to fix our ideas before jumping to bigger applications. The current application is based on a simple numpy application for an embedding neural network. Codes and input files can be found here:

learn_graph_ml/Perozzi_DeepWalk at master · Steboss/learn_graph_ml

The input dataset is data/karate.adjlist , an adjacency list for the Karate club, graph.py is a utility to read the input file and karate.py is our main DeepWalk algorithm file. The calculations are far from being correct, but they are a great way to concretely understand what is going on.

Fig.1 Karate Club graph. The club is composed of 34 members and we can define 78 interactions across each other.
Fig.1 Karate Club graph. The club is composed of 34 members and we can define 78 interactions across each other.

In this example we have 34 nodes and 78 edges. The number of random walk is number_walks=5for each node, thus a total of 170 random walks, that explore a walk_length=5 vertices from a starting one. On line 378 we are creating the initial corpus:

From here we’ll have a list of random walks:

The Skipgram Model

Once the corpus has been generated we can go on with the SkipGram model. In this example, I have not implemented the Hierarchical Softmax, but I am just using a simple softmax layer, as the problem is quite small. These are the steps that we need:

  • Generation of training data: for each input sequence we need to generate a pair (X, y) for the training stage. X is the current vertex, while y is the word within a window_size from X
  • Parameters initialisation: Randomly initialise a matrix of word embeddings and implement the forward step for the 2-hidden layers neural network
  • Neural Networks: A function to update weights and return the training cost, to converge towards the right embeddings.

Generation of Training Data

The training data underline the SkipGram model goal: X is an input word and the model predicts the likelihood of a word at max window_size distance to be neighbour of X . The final training dataset will be a set of pairs (X, y) . For example, given the input sequence 22, 1, 13, 4, 3 and wind_size=5 the couples (X,y) will be (for example): (22, 1), (22, 13), (22, 4), (22, 3), (1, 22), (1, 13), (1, 4), (1, 3), (13, 22), (13, 1), (13, 4) etc. The function generate_training_data accomplishes this goal, looping through all the input random_walks and its elements, selecting left and right indexes idxs_left and idxs_right

The final vectors X and y sizes is (1, 3400) where 3400comes from the total number of random walks ( 170 ) times the number of couples for each random walk, which is 20 .

Parameters Initialisation

The second step is to initialise the following parameters:

  • representation_size This is the latent feature dimensions, namely how many latent "coordinates" SkipGram should return. In this example, this is fixed to 2
  • vocab_size , vocabulary size, which is the total number of vertices in the graph ( vocab_size=34+1=35 )
  • One-hot encoding matrix for the target y . The hot-encoded matrix size is 35, 3400 and it has 1 every time the target word appears:

  • epochs for the SkipGram model. Remember that SkipGram is a Neural Network, so we have to specify how many times we want to cycle through the entire dataset (in this case epochs=2 )
  • batch_size , how many chunks of data should be use at each iteration for the neural network for training ( herebatch_size=4096 as I want to process the full dataset immediately)
  • learning_rate , set to 0.001
  • Embedding and weight matrices for the neural network, initialised through the function param_init In this function we create a random matrix for the word embeddings, whose size is the vocab_size and representation_size – namely for each node we’ll have representation_size coordinates. Secondly, neural network weights are randomly initialised as: np.random.randn(out_size, inp_size) where inp_size is the number of graph vertices and out_size is the representation_size .

Neural Networks

Arrived at this point we can spin up the embedding neural network with the following steps:

  • Define a chunk of batch_size from input data
  • Run the forward propagation part of the neural network ( forward_propagation )
  • Compute the gradients through back propagation ( backward_propagation )
  • Update parameters, embedding matrix and weight matrix ( update_parameters )
  • Compute the cost through the cross entropy ( cross_entropy )

This numpy implementation of SkipGram can help us in understanding what’s hidden in the main DeepWalk.

At first, forward_propagation retrieve the embeddings for each word, node_to_embedding returning the initial random embedding representation for all the 3400 X input of the training dataset. Then, a call to linear_dense function is done, where the neural network weights are update:

The core neural network just take as input the word embeddings which are multiply by the neural network weights: np.dot(weights, node_vector) The final step is to compute the softmax layer for the previous product through softmax function. Fig.7 shows how the input word embeddings change through the network computations.

Fig.7: Forward propagation. Let's take a second to see the evolution of our input data. random initialised word embeddings pass through a neural network. The dot product is computed between input embeddings and network weights. The final layers is a softmax calculation on network's output.
Fig.7: Forward propagation. Let’s take a second to see the evolution of our input data. random initialised word embeddings pass through a neural network. The dot product is computed between input embeddings and network weights. The final layers is a softmax calculation on network’s output.

The computed embeddings and weights need to be updated through backward propagation. The function backward_propagation computes the softmax layer gradients, the neural network weight gradient and embeddings gradient, since we want to train the neural network and the word embedding representation.

Finally, the backward propagation output is used to update the input parameters, namely word embedding and network weights

Well done! These are all the steps enclosed in the DeepWalk SkipGram part.

A large graph dataset classification: Facebook large page-page network

To prove the power and ability of DeepWalk to get social latent representations, in this example we are going to work with a massive dataset which is the Facebook Large page-page network, that can be downloaded freely here [1]. This is dataset contains 22470 Facebook pages, labelled based on their content as tvshow , government , company or politician and 171002 edges. The edges depicts social interactions, for example a page has liked the content of another one. The code for dealing with this graph can be found here: https://github.com/Steboss/learn_graph_ml/blob/master/Perozzi_DeepWalk/facebook.py

The first part of the code opens all the input files provided in [1] and convert the edges info to a networkx graph:

Secondly, we are going to create 80 random walks of length 10 from the input graph. The output is 224’700 (nodes*walk length) random walks, that will be the input sequences for SkipGram:

Once the corpus has been crated we are ready to spin up gensim SkipGram model – how nice to have a model already done? – as fig.12 shows

Word2Vec receives as input:

  • the corpus of sentences walks
  • the size of the latent dimension representation_size=100
  • the window size to look for neighbours for a given word window_size=5
  • the min count to consider a word during training or not. In this case 0 means Word2Vec will take all the words as input
  • option to trigger SkipGram model sg=1
  • option to trigger the hierarchical softmax hs=1
  • the number of iterations we want to run, optional. In this example is 1
  • the number of CPUs
  • a random seed for reproducibility seed=42

This will take a wee while to run, but eventually we’ll have our final embeddings for the given input graph that can be easily saved in a file and will look like this:

From here we can directly interrogate the model to look for pages similar to a given one. For example let’s take the index 14 which is the page Nasa's Marshall Space Flight Center and query the model with model.wv.most_similar() . model.wv_most_similar(14) returns all the indexes of pages similar to page 14 along with their probability score

The query output will be:

Remarkably the model returns pages with high similarity to a given one, with no information about what pages are dealing with. It’s impressive how DeepWalk can exploit a simple similarity with word models to interrogate big input graphs!

That’s all for today! Stay tuned for a new post on Graphs and ML very soon!!!!


Please, feel free to send me an email for questions or comments at: [email protected] or directly here in Medium.

BIBLIOGRAPHY

  1. Rozemberczki, Benedek, Carl Allen, and Rik Sarkar. "Multi-scale attributed node embedding." Journal of Complex Networks 9.2 (2021): cnab014.

Related Articles