Sometimes, simple is better.

The primary goal of a data visualization is to represent a point in a clear, compelling way. As Data Scientists, we’re always aiming to create the coolest, most innovative types of data visualizations. To the point where the actual message is lost. Why over complicate things? Sometimes, a simple message only requires a simple visualization to convey it. In this blog, I’ll be sharing a brief tutorial on how you can easily create a venn diagram with Matplotlib and any kind of data.
Project Context
In my recent project, Twitter Hate Speech Detection, I faced a major challenge with the class imbalance with the data. The entire dataset was 24,802 text tweets, where only 6% was labeled as hate speech. At the end, this class imbalance had an impact on my final model’s results. You can check out the final project’s repository here for more details.
When I was creating the presentation for this project, I needed a way to visualize this problem. Additionally, I wanted to find the number of words that were exclusive to the Hate Speech label, and didn’t overlap with plain offensive language. And then it hit me, a venn diagram could display this concept perfectly. Here is the final product.

Before I dive into the tutorial, let’s go back to basics for a second. We’ve all learned about this simple visualization in elementary school. A venn diagram is a diagram made out of two or more circles that overlap to show the logical relationships between sets. As we know, a set contains the unique values from a dataset. Therefore, this venn diagram shows the unique words from each label, and those that overlap.
Tutorial
Creating venn diagrams in Python is extremely simple. Not many people know this, but the popular data visualization package matplotlib
has an extension that can create customizable venn diagrams. You can check out the documentation here. The first step will always be to install the package onto your local machine.
pip install matplotlib-venn
Since it’s based onmatplotlib
, you’ll need to import matplotlib’s dependencies such as numpy
and scipy
as well.
Once the package is installed, you will need to feed in your data as a set. I’m not going to dive into the specifics of my code, but can check it all out in this notebook. First, I separated the tweets in each label, and then used a map function to turn the tokenized words into two separate lists. From there, I used list comprehension to turn those into nested lists to query through.
After your data is in a suitable format, we can import the package into the notebook itself.
import matplotlib_venn as venn
from matplotlib_venn import venn2, venn2_circles, venn3, venn3_circles
import matplotlib.pyplot as plt
%matplotlib inline
From there, the code to create a venn diagram is as simple as one line of code.
venn2([set(label_1), set(label_2)])
However, the beauty of this package is that it’s very customizable. You can add a title and labels by adding this code to that original line.
venn2([set(label_1), set(label_2)], set_labels = ('Hate Speech', 'Not Hate Speech'))
plt.title('Comparison of Unique Words in Each Corpus Label')
The final step would be to save the figure and use it in a presentation!
plt.savefig('venn_diagram.png', bbox_inches = "tight", pad_inches=.5)
Aside from the title and labels, you can also add more circles, change the color of the circles, add outlines, change the sizes and much more. There are other blogs out there that detail how to do that, and I’ll link those below.
- How to Create and Customize Venn Diagrams in Python
- How to Design Professional Venn Diagrams in Python
With the exploratory data analysis stage, we are constantly asking questions about the data and drawing upon hidden insights. And the next time you find yourself creating a complex visualization and struggling to communicate those insights, the solution could be to simplify it. We shouldn’t overlook basic visualizations such as venn diagrams!