The world’s leading publication for data science, AI, and ML professionals.

Gibbs Sampling Explained

Hands-on Tutorials

Building Intuition through Visualization

Introduction

From political science to cancer genomics, Markov Chain Monte Carlo (MCMC) has proved to be a valuable tool for statistical analysis in a variety of different fields. At a high level, MCMC describes a collection of iterative algorithms that obtain samples from distributions that are difficult to sample directly. These Markov chain-based algorithms combined with a stunning increase in computing power over the past 30 years have allowed researchers to sample from extremely complicated theoretical distributions. Implemented in software like BUGS (Bayesian inference Using Gibbs Sampling) and JAGS (Just Another Gibbs Sampler), Gibbs sampling is one of the most popular MCMC algorithms with applications in Bayesian Statistics, computational linguistics, and more.

In this article, we will unpack how Gibbs sampling works through a series of visualizations, tracing through a simple example with a bivariate normal target distribution.

Setting up the Problem

Say that there is an m-component joint distribution of interest that is difficult to sample from. Even though I do not know how to sample from the joint distribution, assume that I do know how to sample from the full conditional distributions. That is, I can easily sample from each of the m components conditional on the other m-1. Under these conditions, Gibbs sampling iteratively updates each of the components based on the full conditionals to obtain samples from the joint distribution.

To see exactly how this works, let’s consider a simple example. Assume that I only have access to base R which can easily sample univariate Normals, but cannot sample from multivariate normals. Say that the target distribution is bivariate normal where the marginals are standard normal with correlation ρ.

Using properties of the multivariate normal, we have the following full conditional distributions.

Full Conditional Distributions
Full Conditional Distributions

Since these conditionals are just univariate normals, I can sample from them with ease. Then, a Gibbs sampling algorithm for joint distribution would be as follows:

Gibbs Sampling Algorithm
Gibbs Sampling Algorithm

This algorithm looks a little bit intimidating at first, so let’s break this down with some visualizations.

Walking Through One Iteration of the Algorithm

Let’s go step by step through the first iteration of our Gibbs sampler with ρ equal to 0.9.

Step 1: Initialization

Not much to do here initialize __ (xₒ, yₒ) to (0,0) and set the iteration counter t to 0. Here’s a very uninteresting plot of our initial point.

Step 1: Initialization
Step 1: Initialization

Step 2: Conditional Update of X given Y

Now, we draw from the conditional distribution of X given Y equal to 0.

Conditional Update of X given Y
Conditional Update of X given Y

In my simulation, the result of this draw was -0.4. Here’s a plot with our first conditional update. Notice that the Y coordinate of our new point hasn’t changed.

Step 2: Conditional Update for X given Y
Step 2: Conditional Update for X given Y

Step 3: Conditional Update of Y given X

Now, we draw from the conditional distribution of Y given X equal to -0.4. Note that we use the X value that we just drew in step 2 when conducting the draw for step 3.

Conditional Update of Y given X
Conditional Update of Y given X

In our case, the result of this draw was -0.32. This time, the X coordinate of our new point is the same as the X coordinate of the point from step 2.

Step 3: Conditional Update for Y given X
Step 3: Conditional Update for Y given X

Step 4 and Step 5

Increment the iteration counter t and return to step 2. The final point for this iteration of the Gibbs sampler is (-0.4, -0.32). We now have two Gibbs samples. One from our initialization, and one from our first iteration.

Step 4 and 5: Increment and Return
Step 4 and 5: Increment and Return

More Iterations

If we keep running our algorithm (i.e. running steps 2 through 5), we’ll keep generating samples. Let’s run iterations 2 and 3 and plot the results to make sure that we’ve got the pattern down.

Iterations 2 and 3
Iterations 2 and 3

Dropping the intermediate plots of the conditional updates, we can plot the first 100 and 10,000 iterations of our Gibbs sampler.

Gibbs Sampling Illustration, First 100 iterations
Gibbs Sampling Illustration, First 100 iterations
Gibbs Sampling Illustration, First 10,000 iterations
Gibbs Sampling Illustration, First 10,000 iterations

The blue lines on the above plots are the contour lines of the target bivariate normal distribution. Our samples seem to be drawn from the target distribution as desired! Our Gibbs sampler really does produce draws from the joint distribution.

Note: Instead of using the true theoretical density in the animations, I used the empirical density of 100,000 MVN draws from the MASS package for convenience.

Wrap Up

This article illustrates how Gibbs sampling can be used to obtain draws from complicated joint distributions when we have access to the full conditionals–scenarios come up all the time in statistical analysis from hierarchical Bayesian modeling to computational integration. By tracing through an example with a bivariate normal target distribution, we build intuition on how Gibbs sampling really works.


Further Reading:

[1] Maklin, Cory. "Gibbs Sampling." Medium, 2 Oct. 2020, https://towardsdatascience.com/gibbs-sampling-8e4844560ae5.

[2] Introduction to Markov Chain Monte Carlo (MCMC) Sampling, Part 2: Gibbs Sampling. https://www.tweag.io/blog/2020-01-09-mcmc-intro2/. Accessed 23 May 2021.


Sources:

[1]: Dobrow, Robert P. (2016). Introduction to Stochastic Processes with R. Hoboken: John Wiley & Sons, Incorporated.


Acknowledgments: The idea for this article came when I was tutoring for the Harvard Statistics course "Stat 171: Introduction to Stochastic Processes." My tutee asked me to help them understand Example 5.6 in Dobrow’s book, so I created the R script that turned into this article. The algorithm and sampler that I created for this article are from Dobrow’s example.


Related Articles