Hands-on Tutorials

Welcome back to the fourth part of our journey in the Graph Machine Learning world. In the previous post, we saw how DeepWalk algorithm works and its mathematical background. Today we’ll dive into the Python implementation of the algorithm, starting from a simple example (the Karate club) with a simple DeepWalk version in Numpy
and then we’ll jump to a more interesting scenario, with Facebook Large Page Dataset.
Here the list of the previous episodes:
- Scarselli GNN theoretical background
- Scarselli GNN practical implementation
- DeepWalk theoretical background
Concatenating all DeepWalk mathematical steps
Generate a corpus of "sequences"
Now that we know how DeepWalk works mathematically, let’s see together step by step what is going to happen for the Karate club, to fix our ideas before jumping to bigger applications. The current application is based on a simple numpy
application for an embedding neural network. Codes and input files can be found here:
learn_graph_ml/Perozzi_DeepWalk at master · Steboss/learn_graph_ml
The input dataset is data/karate.adjlist
, an adjacency list for the Karate club, graph.py
is a utility to read the input file and karate.py
is our main DeepWalk algorithm file. The calculations are far from being correct, but they are a great way to concretely understand what is going on.

In this example we have 34 nodes and 78 edges. The number of random walk is number_walks=5
for each node, thus a total of 170 random walks, that explore a walk_length=5
vertices from a starting one. On line 378 we are creating the initial corpus:
From here we’ll have a list of random walks:
The Skipgram Model
Once the corpus has been generated we can go on with the SkipGram model. In this example, I have not implemented the Hierarchical Softmax, but I am just using a simple softmax layer, as the problem is quite small. These are the steps that we need:
- Generation of training data: for each input sequence we need to generate a pair (X, y) for the training stage. X is the current vertex, while y is the word within a
window_size
from X - Parameters initialisation: Randomly initialise a matrix of word embeddings and implement the forward step for the 2-hidden layers neural network
- Neural Networks: A function to update weights and return the training cost, to converge towards the right embeddings.
Generation of Training Data
The training data underline the SkipGram model goal: X
is an input word and the model predicts the likelihood of a word at max window_size
distance to be neighbour of X
. The final training dataset will be a set of pairs (X, y)
. For example, given the input sequence 22, 1, 13, 4, 3
and wind_size=5
the couples (X,y)
will be (for example): (22, 1), (22, 13), (22, 4), (22, 3), (1, 22), (1, 13), (1, 4), (1, 3), (13, 22), (13, 1), (13, 4)
etc. The function generate_training_data
accomplishes this goal, looping through all the input random_walks
and its elements, selecting left and right indexes idxs_left
and idxs_right
The final vectors X
and y
sizes is (1, 3400)
where 3400
comes from the total number of random walks ( 170
) times the number of couples for each random walk, which is 20
.
Parameters Initialisation
The second step is to initialise the following parameters:
representation_size
This is the latent feature dimensions, namely how many latent "coordinates" SkipGram should return. In this example, this is fixed to 2vocab_size
, vocabulary size, which is the total number of vertices in the graph (vocab_size=34+1=35
)-
One-hot encoding matrix for the target
y
. The hot-encoded matrix size is35, 3400
and it has 1 every time the target word appears: epochs
for the SkipGram model. Remember that SkipGram is a Neural Network, so we have to specify how many times we want to cycle through the entire dataset (in this caseepochs=2
)batch_size
, how many chunks of data should be use at each iteration for the neural network for training ( herebatch_size=4096
as I want to process the full dataset immediately)learning_rate
, set to0.001
- Embedding and weight matrices for the neural network, initialised through the function
param_init
In this function we create a random matrix for the word embeddings, whose size is thevocab_size
andrepresentation_size
– namely for each node we’ll haverepresentation_size
coordinates. Secondly, neural network weights are randomly initialised as:np.random.randn(out_size, inp_size)
whereinp_size
is the number of graph vertices andout_size
is therepresentation_size
.
Neural Networks
Arrived at this point we can spin up the embedding neural network with the following steps:
- Define a chunk of
batch_size
from input data - Run the forward propagation part of the neural network (
forward_propagation
) - Compute the gradients through back propagation (
backward_propagation
) - Update parameters, embedding matrix and weight matrix (
update_parameters
) - Compute the cost through the cross entropy (
cross_entropy
)
This numpy
implementation of SkipGram can help us in understanding what’s hidden in the main DeepWalk.
At first, forward_propagation
retrieve the embeddings for each word, node_to_embedding
returning the initial random embedding representation for all the 3400 X
input of the training dataset. Then, a call to linear_dense
function is done, where the neural network weights are update:
The core neural network just take as input the word embeddings which are multiply by the neural network weights: np.dot(weights, node_vector)
The final step is to compute the softmax layer for the previous product through softmax
function. Fig.7 shows how the input word embeddings change through the network computations.

The computed embeddings and weights need to be updated through backward propagation. The function backward_propagation
computes the softmax layer gradients, the neural network weight gradient and embeddings gradient, since we want to train the neural network and the word embedding representation.
Well done! These are all the steps enclosed in the DeepWalk SkipGram part.
A large graph dataset classification: Facebook large page-page network
To prove the power and ability of DeepWalk to get social latent representations, in this example we are going to work with a massive dataset which is the Facebook Large page-page network, that can be downloaded freely here [1]. This is dataset contains 22470 Facebook pages, labelled based on their content as tvshow
, government
, company
or politician
and 171002 edges. The edges depicts social interactions, for example a page has liked the content of another one. The code for dealing with this graph can be found here: https://github.com/Steboss/learn_graph_ml/blob/master/Perozzi_DeepWalk/facebook.py
The first part of the code opens all the input files provided in [1] and convert the edges info to a networkx
graph:
Secondly, we are going to create 80 random walks of length 10 from the input graph. The output is 224’700 (nodes*walk length) random walks, that will be the input sequences for SkipGram:
Once the corpus has been crated we are ready to spin up gensim
SkipGram model – how nice to have a model already done? – as fig.12 shows
Word2Vec
receives as input:
- the corpus of sentences
walks
- the size of the latent dimension
representation_size=100
- the window size to look for neighbours for a given word
window_size=5
- the min count to consider a word during training or not. In this case 0 means
Word2Vec
will take all the words as input - option to trigger SkipGram model
sg=1
- option to trigger the hierarchical softmax
hs=1
- the number of iterations we want to run, optional. In this example is 1
- the number of CPUs
- a random seed for reproducibility
seed=42
This will take a wee while to run, but eventually we’ll have our final embeddings for the given input graph that can be easily saved in a file and will look like this:
From here we can directly interrogate the model to look for pages similar to a given one. For example let’s take the index 14
which is the page Nasa's Marshall Space Flight Center
and query the model with model.wv.most_similar()
. model.wv_most_similar(14)
returns all the indexes of pages similar to page 14
along with their probability score
The query output will be:
Remarkably the model returns pages with high similarity to a given one, with no information about what pages are dealing with. It’s impressive how DeepWalk can exploit a simple similarity with word models to interrogate big input graphs!
That’s all for today! Stay tuned for a new post on Graphs and ML very soon!!!!
Please, feel free to send me an email for questions or comments at: [email protected] or directly here in Medium.
BIBLIOGRAPHY
- Rozemberczki, Benedek, Carl Allen, and Rik Sarkar. "Multi-scale attributed node embedding." Journal of Complex Networks 9.2 (2021): cnab014.