The world’s leading publication for data science, AI, and ML professionals.

Which Python Library is the Best One for Data Visualization?

PYTHON

Matplotlib, Seaborn, Plotly Express, and Altair. Which of these is the best library for data visualization?

Image by Jason Coudriet. Source: Unsplash
Image by Jason Coudriet. Source: Unsplash

Data visualization is a crucial step in any exploratory Data Analysis or report. It’s usually easy to read, and it can give us insight into the dataset in one look. There are dozens of great tools for business intelligence, such as Tableau, Google Data Studio, and PowerBI that allow us to create graphs easily. A data analyst or data scientist will often create visualizations on a Jupyter Notebook using Python. Luckily, there are dozens of great Python libraries that create great graphs. However, the million-dollar question is: which one is the best?

Whether you are a student or a professional, you should know a few options out there. There are not perfect libraries. Thus, you should know the pros and cons of each data visualization. I will go over four of the most popular Python libraries for data visualization: Matplotlib, Seaborn, Plotly Express, and Altair. To do so, I will create a simple bar plot and analyze how easy it is to use each library. For this blog, I will use a city dataset. You can find the notebook here.

Categories

This blog will analyze how easy it is to set up a barplot, how easy it is to customize the graph to make it minimally presentable, and the library documentation.

Setting up the dataset

First, let’s import all the important libraries. It’s very likely that you already have Matplotlib and Seaborn installed on your computer. However, you might not have Plotly Express and Altair. You can easily install them using pip install plotly==4.14.3 and pip install altair vega_datasets.

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import altair as alt
import plotly.express as px

Now we will import the dataset. For demonstration purposes, let’s only create a data frame with the top 15 more populated cities in the US. I will also fix the capitalization of the names of the cities. It will facilitate the editing process when we create the visualizations.

df = pd.read_csv('worldcitiespop.csv')
us = df[df['Country'] == 'us']
us['City'] = us['City'].str.title()
cities = us[['City', 'Population']].nlargest(15, ['Population'], keep='first')

Now we should be ready to analyze each library. Are you ready?

Category: Difficulty to Set Up and Initial Result

Winner: Plotly Express Losers: Matplotlib, Altair, and Seaborn

In this category, all the libraries performed well. They are all easy to set up, and the results with basic editing are good enough for most of the analysis, but we need to have winners and losers, right?

Matplotlib is very easy to set up and remember the code. However, the chart doesn’t look good. It will probably do the job for data analysis, but its result for a business meeting is not great.

Image by the author
Image by the author

Seaborn created a better chart. It automatically adds x and y-axis labels. The x ticks could look better, but for a basic chart, that’s way better than Matplotlib.

Image by the author
Image by the author

Plotly Express performed fantastically. It was possible to create a nice-looking and professional bar plot with very little code. It was not necessary to set up the figure or font size. It even rotates the x-axis labels. All of this with one line of code. Very impressive!

Image by the author
Image by the author

The Altair chart performed well. It delivered a nice-looking graph, but it requires more code and, for some reason, organizes the bins in alphabetical order. That’s not terrible, and it can be helpful in many cases, but I feel that this should be something that the user should decide.

Image by the author
Image by the author

Category: Editing and Customization

Winners: Plotly Express, Seaborn, Matplotlib Loser: Altair

I believe that all of the four libraries have the potential to be winners. Customizing charts in each of them is different, but I think that if you learn enough, you will learn how to create beautiful visualizations. However, I’m considering how easy it is to edit and find information on the internet about editing them. I do have experience with these libraries, but I imagined myself as a new user.

Matplotlib and Seaborn are very easy to customize, and their documentation is amazing. Even if you don’t find the information you are looking for on their documentation, you will easily find it on Stack Overflow. They also have the advantage of working together. Seaborn is based on Matplotlib. Thus, if you know how to edit one, you will know how to edit the other, which can be very handy. If you set the Seaborn theme using sns.set_style('darkgrid'), it will affect Matplotlib. That’s probably why Matplotlib and Seaborn are the two more popular libraries for Data Visualization.

Image by the author
Image by the author
Image by the author
Image by the author

Plotly Express has delivered beautiful charts since the beginning, requiring fewer edits than Matplotlib to have a minimally acceptable visualization for a meeting, for example. Its documentation is straightforward to understand, and they offered the documentation through Shift + Tab, which is very handy. Out of all the Libraries I tried, it also provides the most options of customization. You can edit anything, including the font, label color, etc. And the best part is that it’s effortless. Its documentation is full of examples.

Image by the author
Image by the author

I found Altair’s documentation very confusing. Different from other libraries, Altair doesn’t have the Shift + Tab shortcut. That’s very problematic and confusing for beginners. I was able to make some editing, but finding information about it was stressful. Compared to the time I spent editing Matplotlib and Plotly Express, I would say that Altair is not a great option for beginners.

Image by the author
Image by the author

Category: Additional Features

Winners: Plotly Express and Altair Losers: Matplolib and Seaborn

For this category, I will consider additional features besides those that we can achieve through code. Matplotlib and Seaborn are very basic in this category. They don’t offer any extra editing or interaction option besides what you get with code. However, Plotly Express shines in this category. First, the charts are interactive. You just need to hover over the graph and you will see information about it.

Image by the author
Image by the author

Altair offers a few options to save the file or open a JSON file through Vega Editor. Meh!

Image by the author
Image by the author

Category: Documentation and Website

Winners: Plotly Express, Altair, Seaborn, Matplotlib

The documentation for all of these libraries is good. Plotly Express has a beautiful website with demonstrations in code and visualizations. Very easy to read and find information about it. I love how sophisticated and well-designed their website looks. You can even interact with the charts.

Image by the author
Image by the author

Altair also did a good job with their website. Their documentation for customization is not the best, but the website looks good and it’s easy to find examples with the code. I wouldn’t say it’s phenomenal, but it does the job.

Image by the author
Image by the author

Seaborn’s website is OK. Some people say that they have the best documentation. I think it’s OK. It does the job. It contains examples with code. It can get tricky if you are looking for customization options, but other than that, it’s a clean website, and its documentation is quite complete.

Image by the author
Image by the author

Matplotlib has a complete website. In my opinion, it has way too much text, and finding some information can be a little tricky. However, the information is somewhere there. They also offer their documentation in PDF, which is in my plans to read at some point in the future.

Image by the author
Image by the author

Final Veredict

All the four libraries I analyzed in this blog are great libraries with an infinity of possibilities. I have checked only a few factors. For this reason, don’t take this blog for granted. All the libraries have pros and cons, and I wrote this blog as a beginner user. My favorite is Plotly Express because it did well in all the categories. However, Matplotlib and Seaborn are more popular, and the majority of people will have them installed on their computers. Altair is my least favorite between these libraries, but it deserves some attention from me in the future. Let me know what your favorite data visualization library is.


Related Articles