Simple example of 2D density plots in python

How to visualize joint distributions

Madalina Ciortan
Towards Data Science

--

This post will show you how to:

  • Use a Gaussian Kernel to estimate the PDF of 2 distributions
  • Use Matplotlib to represent the PDF with labelled contour lines around density plots
  • How to extract the contour lines
  • How to plot in 3D the above Gaussian kernel
  • How to use 2D histograms to plot the same PDF

Let’s start by generating an input dataset consisting of 3 blobs:

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as st
from sklearn.datasets.samples_generator import make_blobs
n_components = 3
X, truth = make_blobs(n_samples=300, centers=n_components,
cluster_std = [2, 1.5, 1],
random_state=42)
plt.scatter(X[:, 0], X[:, 1], s=50, c = truth)
plt.title(f"Example of a mixture of {n_components} distributions")
plt.xlabel("x")
plt.ylabel("y");

For fitting the gaussian kernel, we specify a meshgrid which will use 100 points interpolation on each axis (e.g. mgrid(xmin:xmax:100j)):

# Extract x and y
x = X[:, 0]
y = X[:, 1]
# Define the borders
deltaX = (max(x) - min(x))/10
deltaY = (max(y) - min(y))/10
xmin = min(x) - deltaX
xmax = max(x) + deltaX
ymin = min(y) - deltaY
ymax = max(y) + deltaY
print(xmin, xmax, ymin, ymax)# Create meshgrid
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]

We will fit a gaussian kernel using the scipy’s gaussian_kde method:

positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)

Plotting the kernel with annotated contours

fig = plt.figure(figsize=(8,8))
ax = fig.gca()
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
cfset = ax.contourf(xx, yy, f, cmap='coolwarm')
ax.imshow(np.rot90(f), cmap='coolwarm', extent=[xmin, xmax, ymin, ymax])
cset = ax.contour(xx, yy, f, colors='k')
ax.clabel(cset, inline=1, fontsize=10)
ax.set_xlabel('X')
ax.set_ylabel('Y')
plt.title('2D Gaussian Kernel density estimation')

The matplotlib object doing the entire magic is called QuadContour set (cset in the code). We can programatically access the contour lines by iterating through allsegs object. The calculated labels are accessible from labelTexts.

plt.figure(figsize=(8,8))for j in range(len(cset.allsegs)):
for ii, seg in enumerate(cset.allsegs[j]):
plt.plot(seg[:,0], seg[:,1], '.-', label=f'Cluster{j}, level{ii}')
plt.legend()

3D KDE plots

We will use matplotlib’s axes3d from mplot3d. We can plot the density as a surface:

fig = plt.figure(figsize=(13, 7))
ax = plt.axes(projection='3d')
surf = ax.plot_surface(xx, yy, f, rstride=1, cstride=1, cmap='coolwarm', edgecolor='none')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('PDF')
ax.set_title('Surface plot of Gaussian 2D KDE')
fig.colorbar(surf, shrink=0.5, aspect=5) # add color bar indicating the PDF
ax.view_init(60, 35)

Or as a wireframe:

fig = plt.figure(figsize=(13, 7))
ax = plt.axes(projection='3d')
w = ax.plot_wireframe(xx, yy, f)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('PDF')
ax.set_title('Wireframe plot of Gaussian 2D KDE');

Representation using 2D histograms

Another way to present the same information is by using 2D histograms. Setting the parameter normed to False returns actual frequencies while a True returns the PDF.

h =plt.hist2d(x, y)
plt.colorbar(h[3])

The entire code is available on Github.

--

--