The world’s leading publication for data science, AI, and ML professionals.

Plotting Venn Diagrams in Python

Learn how to use venn diagrams to show the relationships between two or more sets of data

Photo by Dustin Humes on Unsplash
Photo by Dustin Humes on Unsplash

In data visualization, most of the Charts that we generate belongs to one or more of the following types:

  • Bar chart
  • Pie chart
  • Line chart
  • Histogram
  • Time Series

However, one type of chart that is not commonly used is a Venn diagram. A Venn diagram is another type of visualization that is way too under-rated. It is actually a very useful form of visualization that allows you to examine the relationships between two different sets of data. For example, the following Venn diagram shows the relationship between two sets of creatures – Set A (left circle; creatures with two legs) and B (right circle; creatures that fly). The overlapping region contains those creatures that are two-legged and can fly:

Source: https://en.wikipedia.org/wiki/Venn_diagram#/media/File:Venn_diagram_example.png
Source: https://en.wikipedia.org/wiki/Venn_diagram#/media/File:Venn_diagram_example.png

In this article, I will show you how to plot a Venn diagram from a sample dataset. I will also show you how to customize the Venn diagram to modify its look-and-feel.

So let’s get started!

Installing the matplotlib-venn package

Use the pip command to install the Matplotlib-venn package:

!pip install matplotlib-venn

The Dataset

For the dataset, I have created a fictitious dataset file named purchases.csv with the following content:

custid,product
1,Mac mini
17,Mac mini
1,Mac Studio
2,MacBook Pro 13
3,Mac Studio
18,Mac mini
2,MacBook Pro 13
5,Mac Studio
7,Mac Studio
6,MacBook Pro 13
4,MacBook Pro 13
8,Mac mini
9,Mac mini
5,Mac mini
6,Mac mini
19,Mac mini
8,Mac Studio
2,Mac mini
2,Mac Studio
20,MacBook Pro 13

This file contains the purchases of three Mac products (Mac mini, Mac Studio, and MacBook Pro 13) by the various customers identified by their custid.

The next step would be to load it up into a Pandas DataFrame object:

import pandas as pd

df = pd.read_csv('purchases.csv')
df
All images by author
All images by author

Plotting Venn Diagrams

To plot a 2-circle Venn Diagram, there are a couple of ways to do it. The easiest way is to supply two sets of values to the venn2() function (in the matplotlib_venn package) and it will automatically plot the venn diagram for you.

Let’s use this approach to plot a 2-circle Venn diagram to show the number of people who bought Mac minis, Mac Studios, and both.

First, I will find all the customers who bought a Mac mini and extract the custid as a set:

mac_mini = set(df.query('product == "Mac mini"')['custid'])
mac_mini

The following set of custid are those that purchased a Mac mini:

{1, 2, 5, 6, 8, 9, 17, 18, 19}

Likewise, I will extract all the custid that bought a Mac Studio:

mac_studio = set(df.query('product == "Mac Studio"')['custid'])
mac_studio

These are the customers who bought a Mac Studio:

{1, 2, 3, 5, 7, 8}

We are now ready to plot the 2-circle Venn diagram:

from matplotlib_venn import venn2

venn2([mac_mini, mac_studio],
      ('Mac mini','Mac Studio'))

Note that you need to supply the labels to be displayed on the venn diagram. If you don’t, the default label is A and B, which can be a little misleading or confusing.

If you want a 3-circle Venn diagram, simply call the venn3() function:

from matplotlib_venn import venn3

macbookpro_13 = set(df.query('product == "MacBook Pro 13"')['custid'])
macbookpro_13 # {2, 4, 6, 20}

venn3([mac_mini, mac_studio, macbookpro_13],
      ('Mac mini','Mac Studio','MacBook Pro 13'))

I really like this approach as I don’t have to manually calculate how many people bought only a Mac mini, how many bought a Mac studio, how many bought both Mac mini and Mac Studio, and so on.

Alternative Approach 1

The second approach is to manually pass in the numeric values to the venn2() or venn3() functions. For the venn2() function, the format is: venn2(subsets = (Ab,aB,AB)), where:

  • Ab means the count of items contained within set A, but not in set B
  • aB means the count of items contained within set B, but not in set A
  • AB means the count of items contained both in set A and B

Let’s calculate the values for Ab, aB, and AB using our dataframe. First, find the people who bought a Mac mini but did not buy the Mac Studio:

# calculate Ab
mac_mini_exclude_mac_studio = mac_mini - mac_studio
display(mac_mini_exclude_mac_studio)
# {6, 9, 17, 18, 19}

Then, find the people who bought a Mac Studio but did not buy the Mac mini:

# calculate aB
mac_studio_exclude_mac_mini = mac_studio - mac_mini
display(mac_studio_exclude_mac_mini) 
# {3, 7}

Finally, find the people who bought both the Mac mini and the Mac Studio:

# calculate AB
mac_mini_and_mac_studio = mac_studio.intersection(mac_mini) 
display(mac_mini_and_mac_studio)
# {1, 2, 5, 8}

With the values for Ab, aB, and AB calculated, you now just need to count the items in each set and pass them to the venn2() function:

venn2(subsets = (
        len(mac_mini_exclude_mac_studio),   # Ab
        len(mac_studio_exclude_mac_mini),   # aB
        len(mac_mini_and_mac_studio)        # AB
      ),
      set_labels = ('Mac mini','Mac Studio')
     )

Not surprisingly, the result is the same as the one we had earlier:

Alternative Approach 2

The next approach is the binary approach. Instead of passing in the values as a tuple, you pass in a dictionary. For a 2-circle Venn diagram, you pass in binary values in the following format:

  • Ab – "10"
  • aB – "01"
  • AB – "11"

And for 3-circle venn diagram the binary values are as follows:

  • Abc – "100"
  • ABc – "110"
  • ABC – "111"
  • aBC – "011"
  • abC – "001"
  • AbC – "101"
  • aBc – "010"

The following code snippet plots the same 2-circle Venn diagram that you did previously:

venn2(subsets = {
        '10': len(mac_mini_exclude_mac_studio),  # Ab
        '01': len(mac_studio_exclude_mac_mini),  # aB
        '11': len(mac_mini_and_mac_studio)       # AB
        },
      set_labels = ('Mac mini','Mac Studio'),
     )

Customizing the Venn diagram

Since the resultant Venn diagram is created using matplotlib, it is customizable just like any charts created using matplotlib.

Setting the alpha

You can set the alpha (transparency) setting on the circles using the alpha parameter:

v2 = venn2(subsets = {
        '10': len(mac_mini_exclude_mac_studio), 
        '01': len(mac_studio_exclude_mac_mini),
        '11': len(mac_mini_and_mac_studio)
        },
      set_labels = ('Mac mini','Mac Studio'),
      alpha = 0.8,
     )

Here’s how the chart looks like with the alpha parameter set to 0.8. If you want a lighter shade, set it to a lower value like 0.1 or 0.2:

Setting the colors

You can specify the individual colors of the circles using the set_colors parameter:

v2 = venn2(subsets = {
        '10': len(mac_mini_exclude_mac_studio), 
        '01': len(mac_studio_exclude_mac_mini),
        '11': len(mac_mini_and_mac_studio)
        },
      set_labels = ('Mac mini','Mac Studio'),
      alpha = 0.8,
      set_colors=('lightblue', 'yellow')
     )

Setting the line styles

To draw outlines for the circles, use the venn2_circles() function (for 2-circle Venn diagram) together with the venn2() function. The following code snippet shows how you can draw dashed ( --) outlines with a line width of 5 each:

from matplotlib_venn import venn2_circles

c = venn2_circles(subsets = {
        '10': len(mac_mini_exclude_mac_studio), 
        '01': len(mac_studio_exclude_mac_mini),
        '11': len(mac_mini_and_mac_studio)
        },
      linestyle='--',   
      linewidth=5,
     )

You can refer to https://matplotlib.org/3.1.0/gallery/lines_bars_and_markers/linestyles.html for the list of line styles supported.

Here is the updated Venn diagram:

Setting the font size

There are two types of labels displayed on the Venn diagram:

  • Labels – the text outside the circle
  • Subset Labels – the text inside the circle

The following code snippet sets the font sizes for both types of labels:

for text in v2.set_labels:     # the text outside the circle
    text.set_fontsize(20);

for text in v2.subset_labels:  # the text inside the circle
    text.set_fontsize(15)

Customizing the line style

You can also programmatically set the style and line width of the outlines outside the venn2_circles() function:

c[0].set_lw(3.0) # customize left outline
c[0].set_ls('-.')   

c[1].set_lw(2.0) # customize right circle 
c[1].set_ls('--')

Setting Plot Title

And since this is matplotlib, you can obviously set the title for the figure:

import matplotlib.pyplot as plt

plt.title('Customers distribution for Mac Mac and Mac Studio')

Setting the subset labels

If you want to customize the appearances of individual labels, you can use the get_label_by_id() function and pass in the binary value for the individual circle to reference the labels and set their display text and color:

for text in v2.set_labels:     # the text outside the circle
    text.set_fontsize(20);

for text in v2.subset_labels:  # the text inside the circle
    text.set_fontsize(12)

text = 'Mac minin'
for i in mac_mini_exclude_mac_studio:
    text += f'{i}n'
v2.get_label_by_id('10').set_text(text)  # Mac mini

text = 'Mac Studion'
for i in mac_studio_exclude_mac_mini:
    text += f'{i}n'
v2.get_label_by_id('01').set_text(text)  # Mac Studio

text = 'Mac mini &n Mac Studion'
for i in mac_mini_and_mac_studio:        # Mac mini and Mac Studio
    text += f'{i}n'
v2.get_label_by_id('11').set_text(text)
v2.get_label_by_id('11').set_color('red')

If you like reading my articles and that it helped your career/study, please consider signing up as a Medium member. It is $5 a month, and it gives you unlimited access to all the articles (including mine) on Medium. If you sign up using the following link, I will earn a small commission (at no additional cost to you). Your support means that I will be able to devote more time on writing articles like this.

Join Medium with my referral link – Wei-Meng Lee

Summary

And so you have it! You learned how to plot simple 2-circle and 3-circle Venn diagrams using a sample dataframe, and a host of customizations you can make to your diagrams. Plotting the Venn diagrams is easy, the more challenging part is wrangling your data so that you can use it to pass to the API for plotting. In any case, I hope you have fun with Venn diagrams!


Related Articles