Building Intuition through Visualization
Introduction
From political science to cancer genomics, Markov Chain Monte Carlo (MCMC) has proved to be a valuable tool for statistical analysis in a variety of different fields. At a high level, MCMC describes a collection of iterative algorithms that obtain samples from distributions that are difficult to sample directly. These Markov chain-based algorithms combined with a stunning increase in computing power over the past 30 years have allowed researchers to sample from extremely complicated theoretical distributions. Implemented in software like BUGS (Bayesian inference Using Gibbs Sampling) and JAGS (Just Another Gibbs Sampler), Gibbs sampling is one of the most popular MCMC algorithms with applications in Bayesian Statistics, computational linguistics, and more.
In this article, we will unpack how Gibbs sampling works through a series of visualizations, tracing through a simple example with a bivariate normal target distribution.
Setting up the Problem
Say that there is an m-component joint distribution of interest that is difficult to sample from. Even though I do not know how to sample from the joint distribution, assume that I do know how to sample from the full conditional distributions. That is, I can easily sample from each of the m components conditional on the other m-1. Under these conditions, Gibbs sampling iteratively updates each of the components based on the full conditionals to obtain samples from the joint distribution.
To see exactly how this works, let’s consider a simple example. Assume that I only have access to base R which can easily sample univariate Normals, but cannot sample from multivariate normals. Say that the target distribution is bivariate normal where the marginals are standard normal with correlation ρ.
Using properties of the multivariate normal, we have the following full conditional distributions.

Since these conditionals are just univariate normals, I can sample from them with ease. Then, a Gibbs sampling algorithm for joint distribution would be as follows:

This algorithm looks a little bit intimidating at first, so let’s break this down with some visualizations.
Walking Through One Iteration of the Algorithm
Let’s go step by step through the first iteration of our Gibbs sampler with ρ equal to 0.9.
Step 1: Initialization
Not much to do here initialize __ (xₒ, yₒ) to (0,0) and set the iteration counter t to 0. Here’s a very uninteresting plot of our initial point.

Step 2: Conditional Update of X given Y
Now, we draw from the conditional distribution of X given Y equal to 0.

In my simulation, the result of this draw was -0.4. Here’s a plot with our first conditional update. Notice that the Y coordinate of our new point hasn’t changed.

Step 3: Conditional Update of Y given X
Now, we draw from the conditional distribution of Y given X equal to -0.4. Note that we use the X value that we just drew in step 2 when conducting the draw for step 3.

In our case, the result of this draw was -0.32. This time, the X coordinate of our new point is the same as the X coordinate of the point from step 2.

Step 4 and Step 5
Increment the iteration counter t and return to step 2. The final point for this iteration of the Gibbs sampler is (-0.4, -0.32). We now have two Gibbs samples. One from our initialization, and one from our first iteration.

More Iterations
If we keep running our algorithm (i.e. running steps 2 through 5), we’ll keep generating samples. Let’s run iterations 2 and 3 and plot the results to make sure that we’ve got the pattern down.

Dropping the intermediate plots of the conditional updates, we can plot the first 100 and 10,000 iterations of our Gibbs sampler.

The blue lines on the above plots are the contour lines of the target bivariate normal distribution. Our samples seem to be drawn from the target distribution as desired! Our Gibbs sampler really does produce draws from the joint distribution.
Note: Instead of using the true theoretical density in the animations, I used the empirical density of 100,000 MVN draws from the MASS package for convenience.
Wrap Up
This article illustrates how Gibbs sampling can be used to obtain draws from complicated joint distributions when we have access to the full conditionals–scenarios come up all the time in statistical analysis from hierarchical Bayesian modeling to computational integration. By tracing through an example with a bivariate normal target distribution, we build intuition on how Gibbs sampling really works.
Further Reading:
[1] Maklin, Cory. "Gibbs Sampling." Medium, 2 Oct. 2020, https://towardsdatascience.com/gibbs-sampling-8e4844560ae5.
[2] Introduction to Markov Chain Monte Carlo (MCMC) Sampling, Part 2: Gibbs Sampling. https://www.tweag.io/blog/2020-01-09-mcmc-intro2/. Accessed 23 May 2021.
Sources:
[1]: Dobrow, Robert P. (2016). Introduction to Stochastic Processes with R. Hoboken: John Wiley & Sons, Incorporated.
Acknowledgments: The idea for this article came when I was tutoring for the Harvard Statistics course "Stat 171: Introduction to Stochastic Processes." My tutee asked me to help them understand Example 5.6 in Dobrow’s book, so I created the R script that turned into this article. The algorithm and sampler that I created for this article are from Dobrow’s example.