Altair vs. Matplotlib

A comparison between Python visualisation libraries

Eugene Teoh
Towards Data Science

--

Photo by Clay Banks on Unsplash

If you have just started in Python, you might have heard of Matplotlib. Matplotlib has been the go-to visualisation library for anyone starting in Python, but is it the best?

In this article, I will introduce and discuss the differences between Altair and Matplotlib, what they are good at, and who should use them?

First, let’s analyse what Matplotlib is good at.

I will be using Datapane to embed visualisations from each library so that the plots retain their characteristics.

Matplotlib

Matplotlib is an exhaustive visualisation library which comprises of many functionalities. Its concept is based on MATLAB’s plotting API. Those that have used MATLAB will feel more at home. It is most probably the first Python visualisation library Data Scientists will learn.

Matplotlib makes easy things easy and hard things possible.

Matplotlib Documentation

The image above describes the general concepts of a Matplotlib figure.

Image from Matplotlib Documentation.

Pros

  • Customizable

Because of its low-level interface nature, Matplotlib can plot anything. If you just want a quick experiment, a few lines of code will plot you any mathematical functions you want. If you want to plot complicated visualisations, with a little tinkering, you will be able to do it! There is even support for 3D visualisation.

Let’s start with a simple plot.

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(1, 100)
y = 3 * x ** 2
fig = plt.figure()
plt.plot(y)
plt.title(r"$y = 3x^2$")

It can even plot the text below:

# https://gist.github.com/khuyentran1401/d0e7397ecefcb8412a161817d1e23685#file-text-pyfig = plt.figure()plt.text(0.6, 0.7, "learning", size=40, rotation=20.,
ha="center", va="center",
bbox=dict(boxstyle="round",
ec=(1., 0.5, 0.5),
fc=(1., 0.8, 0.8),
)
)
plt.text(0.55, 0.6, "machine", size=40, rotation=-25.,
ha="right", va="top",
bbox=dict(boxstyle="square",
ec=(1., 0.5, 0.5),
fc=(1., 0.8, 0.8),
)
)
  • Animation

Matplotlib also offers a package for live animations. It allows you to plot live data such as a sinusoidal wave, or even the NASDAQ stock market index!

"""
==================
Animated line plot
==================
"""
# https://matplotlib.org/3.1.1/gallery/animation/simple_anim.html
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
fig, ax = plt.subplots()x = np.arange(0, 2*np.pi, 0.01)
line, = ax.plot(x, np.sin(x))
def animate(i):
line.set_ydata(np.sin(x + i / 50)) # update the data.
return line,
ani = animation.FuncAnimation(
fig, animate, interval=20, blit=True, save_count=50)
# To save the animation, use e.g.
#
# ani.save("movie.mp4")
#
# or
#
# writer = animation.FFMpegWriter(
# fps=15, metadata=dict(artist='Me'), bitrate=1800)
# ani.save("movie.mp4", writer=writer)
plt.show()

Cons

  • Not flexible

Because of its low-level interface nature, plotting simple data will be easy. However, when the data gets very complex, more lines of code will be required from trivial issues such as formatting.

I will be using the GitHub dataset from this article to demonstrate:

The plot below shows how disorganised it looks when the data gets large and complex.

Next, let’s learn about Altair.

Altair

Altair takes a completely different approach from Matplotlib. It is a declarative statistical visualisation library, initially released in 2016, and is built on Vega and Vega-Lite. It also uses Pandas Dataframe for the data expression. They have three design approaches in mind:

  • Constrained, simple and declarative to allow focus on the data rather than trivial issues such as formatting.
  • To emit JSON output that follows the Vega and Vega-Lite specifications
  • Render the specifications using existing visualisation libraries

Pros

  • Intuitive and structured

Altair provides a very intuitive and structured approach to plotting. I will use the simple example from the Matplotlib section:

import numpy as np
import altair as alt
x = np.linspace(1, 100)
y = 3 * x ** 2
df_alt = pd.DataFrame({'x': x, 'y': y})alt.Chart(df_alt).mark_line().encode(
x='x',
y='y'
)

You can see how we can do the same thing as with Matplotlib but with less code!

Basic characteristics of Altair:

  • Marks

Marks specify how the data is represented in the plot. For example, mark_line() expresses the data as a line plot, mark_point() makes it into a scatter plot, mark_circle() creates a scatter plot with filled circles.

  • Encodings

Encodings are called by encode(). It allows the mapping of data to different channels such as x, y, colour, shape etc. For example, if I were to have multiple columns in my DataFrame, I could map the x and y axes to different columns of data. Or if I would like to colour my plot with a different colour, I could change my encoding channels.

  • Interactive

One of the most unique features of Altair is the interactive plots. With interactive() you can make any plot interactive, allowing you to zoom in and out, highlight certain regions of the plot and much more. This functionality is particularly useful when you have large and complex data.

  • Flexible

With its declarative nature, Altair can plot and complex datasets with only several lines of code! This allows Data Scientists to have better user experience for data visualisation without worrying much about trivial plotting issues.

This example below shows both the interactiveness and flexibility of Altair. The histogram plots the area that is being highlighted. Want to combine multiple plots? Just use the & symbol! You can already see how much less code is needed to build something complicated like this.

# https://altair-viz.github.io/gallery/selection_histogram.html
from vega_datasets import data
source = data.cars()brush = alt.selection(type='interval')points = alt.Chart(source).mark_point().encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q',
color=alt.condition(brush, 'Origin:N', alt.value('lightgray'))
).add_selection(
brush
)
bars = alt.Chart(source).mark_bar().encode(
y='Origin:N',
color='Origin:N',
x='count(Origin):Q'
).transform_filter(
brush
)
plot = points & bars
plot

Cons

  • Not as customizable

With Altair’s declarative and high-level approach to plotting, it makes plotting complex machine learning models and much more difficult. In the Altair Documentation, they also do not recommend creating plots with more than 5000 rows, which will cause errors.

  • No 3D Visualisation

Data Scientists often require visualisation in the 3D plane to allow better interpretation of the data. Examples of it include dimensionality reduction techniques such as Principle Component Analysis (PCA), or word2vec and much more. In this case, I would default to Matplotlib or other visualisation libraries with better 3D visualisation support.

Conclusion

That’s it! I hope you learned something new about both Matplotlib and Altair. Now, you should practice what you have learned with your projects. If you are keen on learning more about data visualisation, you should explore other libraries such as Seaborn, Plotly, Bokeh and Folium.

The article above written by Khuyen includes a great summary of 6 different visualisation libraries.

Finally, if you would like to have a chat with me, connect me on LinkedIn!

--

--