Implementing Mixed Membership Stochastic Blockmodel

Saumya Shah
Towards Data Science
6 min readJun 16, 2019

--

MMSB: A probabilistic model useful for learning the structure of a graph. Code written in Julia can be found here (Jupyter notebook with the plots and outputs) or here (.jl file).

From social networks to protein interactions, graphs have become ubiquitous.

Hello!

My name is Saumya Shah. This is an article describing the implementation of the Mixed Membership Stochastic Blockmodel¹, a Bayesian model for learning the structure of a graph. For this purpose, I will use Turing.jl, a Probabilistic Programming Language (PPL) written in Julia. Defining models in Turing is a breeze. You write the code just as you would write the model on a piece of paper. So, without further ado, let’s begin!

Introduction

The MMSB is used for modelling data consisting of pairwise measurements, modelled in the form of graphs. Consider a graph G with N nodes, with binary edge weights (i.e. either 0 or 1). So, a 1 represents a connection between two nodes and a 0 represents no connection. An adjacency matrix represents these connections between the nodes. The MMSB helps in learning the underlying structure of this graph, which can enable reconstruction of the graph, identifying patterns and relations between nodes, and predicting links for unseen data. In the following example, we assume this to be an undirected graph (i.e. the adjacency matrix is symmetric).

Sure, it has a fancy name, but where do I use it?

  • Social Networks: From the relational data between users and the communities that they belong to, we can cluster the groups that have common interests and suggest recommendations.
  • Protein Interactions: Different proteins interact with each other in varying ways. Using the MMSB, we can predict the interaction between two particular proteins and their functions.
  • Academic Citations: The citation network can help us identify the importance of a research paper based on a scientist’s field and his publications.

How does it work?

Let’s delve into some math :). First, we assume that our graph has N nodes. There are a total of K clusters. Each node has a membership probability vector denoted by π_n that specifies the probability of the node n belonging to each of the K clusters. For each pair of nodes n and m, we draw a cluster from each of the two probability vectors π_n and π_m. This means that the node n can belong to a particular cluster when considering the link between nodes n and m, while it can belong to another cluster when considering the relationship between nodes n and l. Thus, each node’s cluster membership is context dependent. Here is where the mixed membership part of the model comes in. In a nutshell, each node n can potentially have multiple cluster memberships, depending on which node m it interacts with.

So we have the clusters for each pair of nodes. What do we do with it? Here is where the stochastic blockmodel part comes in. The link between a pair of objects depends on the clusters they belong to. For this purpose, we maintain a K×K matrix η that specifies the probability of a connection between any two clusters. We use a Bernoulli distribution to determine whether there is a link between a pair of clusters drawn from a pair of nodes.

To sum it up, we consider all pairs of nodes to determine whether they are linked. For a given pair, we draw a cluster assignment for each node from its respective probability vector. Based on these cluster assignments, we find the probability of a link between this pair of clusters by looking up its value in matrix η. Then, we pass this probability to a Bernoulli distribution to get a 1 (connected) or 0 (not connected) value.

Mathematically, the generative story is specified as (here, α and η are hyperparameters and π_1, …, π_N, z_11, …, z_NN are the parameters that we have to infer):

Where’s the code?

“Talk is cheap. Show me the code.”- Linus Torvalds

All right, since we have laid the groundwork for what MMSB is and how it works, let us look at the code. We start by importing the libraries:

Moving on, we choose the number of clusters K = 2 and set the hyperparameters:

Now, we generate the data that we use for our model. To do this, we take random cluster assignments for each of the nodes. We also visualise this graph in the form of a heatmap, using matplotlib as backend.

The following visualisations are obtained:

Note that the visualizations may be different for you since the clusters are randomly generated.

Let’s define the model. I have slightly tweaked the model to have “soft” cluster assignments. So, instead of drawing from the categorical distributions, we use the entire probability vectors to determine the probability of having a link between each pair of nodes. Thus, the parameters z_11, …, z_NN have been eliminated and we only need to infer pi_1, …, pi_N. This tweak reduces the number of parameters to be estimated and leads to faster convergence.

Remember the ease of defining models in Turing that I was talking about? Here is a live example! Notice how the model is defined similarly to how it is written.

Now that we defined the model, we can draw samples for the posterior. The following code block demonstrates this:

You should see an output describing the chain and a summary of the sampled parameters. For viewing the output that I obtained, you can have a look at the code here (in Jupyter notebook). We can also plot these samples to visualise their distribution and check whether they have converged.

As can be seen from the plots above, all the parameters have converged after approximately 30,000 iterations. So, we can expect our model to give good predictions. Let us reconstruct the graph using the sampled parameters to see how well they have been estimated.

The heat map for the predicted graph is:

This is exactly the same as the data that we generated! The model has successfully reconstructed the original graph.

We can also numerically count the number of mismatches between the reconstructed and the original graphs. The code for this is as follows:

As expected, this will print:

The number of mismatches is 0

This concludes my implementation of the Mixed Membership Stochastic Blockmodel in Julia. If you have any questions or doubts regarding this, feel free to contact me at sshah@iitk.ac.in or you can tag me on the Julia slack with @Saumya Shah. Hope you enjoyed reading this post as much as I did writing it :).

The above-explained code can be found here (Jupyter notebook with the plots and outputs) or here (.jl file).

What’s Next?

I will now move on to implementing some very interesting topics: Time Series Models. I will write up a post on that once it’s ready. Meanwhile, some students at Turing are developing a Variational Inference interface which should be up and ready in a month or two. Once that is done, I will integrate it with the above MMSB implementation to allow inference for larger graphs.

Reference:

[1] Airoldi, Edoardo M., David M. Blei, Stephen E. Fienberg, and Eric P. Xing, Mixed membership stochastic blockmodels (2008), Journal of machine learning research 9, no. Sep (2008): 1981–2014.

--

--