The world’s leading publication for data science, AI, and ML professionals.

Demystifying Matplotlib

There's a reason you're confused

Quick Success Data Science

Image by Cederic Vandenberghe on Unsplash
Image by Cederic Vandenberghe on Unsplash

Do you struggle with Matplotlib? If you’re a beginner, it may be because you haven’t taken the time to learn a few of its idiosyncrasies. If you suspect that’s the case, then do yourself a favor and read on! This won’t hurt or take too much time.

Matplotlib

The open-source Matplotlib library dominates plotting in Python. It lets you generate quick and simple plots as well as elaborate, complex charts where you control every aspect of the display. Its popularity and maturity mean that you can always find helpful advice and useful code examples.

Like any powerful piece of software, Matplotlib can be, as one author put it, "syntactically tedious." The simplest plots are easy, but the difficulty ramps up quickly. And even though resources like the Matplotlib gallery provide helpful code samples, if you want something slightly different than what’s provided, you might find yourself scratching your head.

In fact, many people use Matplotlib by copying and pasting other people’s code and then hacking at the edges until they get something they like. As a user once told me, "No matter how many times I use Matplotlib, it always feels like the first time!"

Fortunately, you can greatly alleviate this pain by taking the time to learn some key aspects of the package. So, in this article, we’ll focus on the nomenclature and plotting interfaces that can cause confusion. Armed with this knowledge, you may find Matplotlib a tool to embrace instead of one to avoid or use reluctantly.

What’s the Problem?

Based on my experience learning Matplotlib, here are three issues that cause confusion:

  1. The somewhat awkward nomenclature used for plots.
  2. The co-existence of two plotting interfaces which I’ll call the Pyplot approach and the object-oriented style.
  3. Plot manipulation methods in the two interfaces that have similar but different names.

Let’s take a look at these in turn.

The Anatomy of a Plot

The first step in understanding Matplotlib is mastering the plot nomenclature. To that end, let’s dissect a plot and its components.

Plots in Matplotlib are held within a Figure object. This is a blank canvas that represents the top-level container for all plot elements. Besides providing the canvas on which the plot is drawn, the Figure object also controls things like the size of the plot, its aspect ratio, the spacing between multiple plots drawn on the same canvas, and the ability to output the plot as an image. The left-most square in the following figure represents a Figure object.

Anatomy of a plot (Source: Python Tools for Scientists [1])
Anatomy of a plot (Source: Python Tools for Scientists [1])

The plots themselves – that is, the things that you and I think of as figures – are represented by the Axes class, shown in the center of the previous diagram. This class includes most of the figure elements, such as lines, polygons, markers (points), text, titles, and so on, as well as the methods that act on them. It also sets the coordinate system. A Figure object can contain multiple Axes objects, but each Axes object can belong to only one Figure.

The Axes object should not be confused with the Axis element that represents the numerical values on, say, the x- or y-axis of a chart (right-most display in the previous diagram). This includes the tick marks, labels, and limits. All these elements are contained within the Axes class.

Each of the components in the previous diagram exists within the hierarchical structure shown below. The lowest layer includes elements such as each axis, the axis tick marks and labels, and the curve (Line2D). The highest level is the Figure object, which serves as a container for everything below it.

The hierarchy of plot components in the previous figure (Source: Python Tools for Scientists [1])
The hierarchy of plot components in the previous figure (Source: Python Tools for Scientists [1])

Because a Figure object can hold multiple Axes objects, you could have more than one Axes object point to the Figure object in the previous diagram. A common example of this is subplots, in which one Figure canvas holds two or more different plots:

Example of two subplots in one Figure object (designated by the red box) (by the author)
Example of two subplots in one Figure object (designated by the red box) (by the author)

The pyplot and Object-oriented Approaches

There are two primary interfaces for plotting with Matplotlib. Using the first, referred to as the pyplot approach, you rely on Matplotlib’s internal pyplot module to automatically create and manage Figure and Axes objects, which you then manipulate with pyplot methods for plotting. Designed mainly for dealing with single plots, the pyplot approach reduces the amount of code that you need to know and write. It’s a MATLAB-like API that can be very convenient for quick, interactive work.

Here’s an example:

import matplotlib.pyplot as plt
import numpy as np

data = np.arange(5, 10)
plt.plot(data)
Output of the pyplot approach (by author)
Output of the pyplot approach (by author)

This whole plot required one line of code. The pyplot module made every decision for you, including the use of a line, the color and weight of the line, the range of values on each axis, and the text font and color. It also provided corresponding x values for each y value, with the assumption that the count starts at 0 and increases with a step of 1 unit.

Using the second approach, called the object-oriented style, you explicitly create Figure and Axes objects and then call methods on the resulting objects. This gives you the most control over customizing your plots and keeping track of multiple plots in a large program. It’s also easier to understand interactions with other libraries if you first create an Axes object.

import matplotlib.pyplot as plt
import numpy as np

data = np.arange(5, 10)
fig, ax = plt.subplots()
ax.plot(data)
Output of the object-oriented style (by author)
Output of the object-oriented style (by author)

The results are identical to those obtained with the pyplot approach.

As soon as you see the following line, you know you’re dealing with the object-oriented style:

fig, ax = plt.subplots()

The plt.subplots() method creates a Figure instance and a set of subplots (a NumPy array of Axes objects). If the number of subplots is not specified, a single subplot is returned by default.

Because two objects are returned, you need to unpack the results to two variables, called fig and ax by convention. Remember that, with the pyplot approach, these two entities are created behind the scenes.

In the sections that follow, we’ll look at both approaches. However, according to the Matplotlib documentation, to maintain consistency you should choose one approach and stick to it. They suggest using the object-oriented style, particularly for complicated plots as well as for methods and scripts that are intended to be reused as part of a larger project.

It can certainly be argued that one of the reasons beginners find Matplotlib intimidating is that they see a mixture of these approaches in existing code, such as on question-and-answer sites like Stack Overflow. Because this is unavoidable, I suggest that you read over the descriptions for both approaches so that you can make an informed decision on which one to choose for yourself. This way, you’ll have an awareness of the alternate approach when you encounter it in legacy code or in tutorials.

Using the pyplot Approach

In the previous section, we made a plot with pyplot using one line of code:

plt.plot(data)

Two things are worth noting here: we didn’t explicitly refer to Figure or Axes objects in the code, as pyplot took care of these behind the scenes. Nor did we specify what elements to show in the plot, including the ticks and values displayed along the x- and y-axes. Instead, Matplotlib looked at your data and made intelligent choices about the type of plot you wanted and how to annotate it.

Along these lines, the plot() method makes line charts, scatter() makes scatterplots, bar() makes bar charts, hist() makes histograms, pie() makes pie charts, and so on. You can find examples of all these in the Matplotlib plot types index.

The automatic nature of pyplot’s plot creation methods is useful when you want to quickly explore a dataset, but the resulting plots are generally too plain for presentations or reports. One issue is that the default configuration of methods like plt.plot() assumes that you want the size of each axis to match the range of the input data (such as x from 5 to 8, rather than 0 to 10 if the data is limited to values between 5 and 8).

It also assumes that you don’t want a legend, title, or axis label and that you want lines and markers drawn in blue. This isn’t always the case, so pyplot provides many methods to embellish charts with titles, axis labels, background grids, and so on. We’ll look at these next.

Creating and Manipulating Plots with pyplot Methods

Despite being considered a simpler approach than the object-oriented style, pyplot can still produce some very elaborate plots. To demonstrate, let’s use some pyplot methods to create a more sophisticated plot than before.

A catenary is the shape that a chain assumes when it’s hung from both of its ends. It’s a common shape in nature and architecture, examples being a square sail under wind pressure and the famous Gateway Arch in St. Louis, Missouri. You can generate a catenary with the following code, where cosh(x) represents the hyperbolic cosine of the x values.

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-5, 5, 0.1)
y = np.cosh(x)

plt.title('A Catenary')
plt.xlabel('Horizontal Distance')
plt.ylabel('Height')
plt.xlim(-8, 8)
plt.ylim(0, 60)
plt.grid()
plt.plot(x, y, lw=3, color='firebrick')
Output of the pyplot catenary program (by the author)
Output of the pyplot catenary program (by the author)

Despite being somewhat verbose, the code is quite logical and readable. All the plotting steps call methods on plt.

In Matplotlib, the elements rendered on a Figure canvas, such as a title, legend, or line, are called Artist objects. Standard graphical objects, like rectangles, circles, and text, are referred to as primitive Artists. The objects that hold the primitives, like the Figure, Axes, and Axis objects, are called container Artists.

Some of the more common pyplot methods for making plots and working with Artists are listed in the tables that follow. To see a full list, visit the Matplotlib pyplot summary page. Clicking the method names in this online list will take you to detailed information on the method parameters, along with example applications. To read more about Artists in general, visit the Matplotlib artist’s page.

Some useful pyplot methods for creating plots (Source: Python Tools for Scientists [1])
Some useful pyplot methods for creating plots (Source: Python Tools for Scientists [1])
Some useful pyplot methods for manipulating plots (Source: Python Tools for Scientists [1])
Some useful pyplot methods for manipulating plots (Source: Python Tools for Scientists [1])

Note that the code examples in the tables represent simple cases. Most methods take many arguments, letting you fine-tune your plots with respect to properties like font style and size, line widths and colors, rotation angles, exploded views, and much more.

Working with Subplots

So far, we’ve been working with single figures, but there’ll be times when you’ll want to compare two plots side by side or bundle several charts into a summary display. For these occasions, Matplotlib provides the subplot() method. To see how this works, let’s begin by generating data for two different sine waves:

time = np.arange(-12.5, 12.5, 0.1)
amplitude = np.sin(time)
amplitude_halved = np.sin(time) / 2

One way to compare these waveforms is to plot them in the same Axes object, like so:

plt.plot(time, amplitude, c='k', label='sine1')
plt.plot(time, amplitude_halved, c='firebrick', ls='--', label='sine2')
plt.legend()
Output of the pyplot sine program (by the author)
Output of the pyplot sine program (by the author)

By default, the two curves would be plotted with different colors (blue and orange). We overrode this with black (using the shorthand ‘k’) and "firebrick" red. We also forced a different line style using the ls parameter. Otherwise, both lines would have been solid. (For a list of characters available for marker and line styles, visit [this site](https://matplotlib.org/stable/api/_as_gen/ matplotlib.pyplot.plot.html)).

If you’re comparing more than a few curves, a single plot can become cluttered and difficult to read. In those cases, you’ll want to use separate stacked plots created by the subplot() method. The following diagram describes the syntax for this method, in which four subplots (Axes) are placed in a single Figure container.

Understanding the subplot() method (Source: Python Tools for Scientists [1])
Understanding the subplot() method (Source: Python Tools for Scientists [1])

The subplots will be arranged in a grid, and the first two arguments passed to the subplot() method specify the dimensions of this grid. The first argument represents the number of rows in the grid, the second is the number of columns, and the third argument is the index of the active subplot (highlighted in gray in the diagram).

The active subplot is the one you are currently plotting in when you call a method like plot() or scatter(). Unlike most things in Python, the first index is 1, not 0.

Matplotlib uses a concept called the "current figure" to keep track of which Axes is currently being worked on. For example, when you call plt.plot(), pyplot creates a new "current figure" Axes to plot on. When working with multiple subplots, the index argument tells pyplot which subplot represents the "current figure."

For convenience, you don’t need to use commas with the subplot() arguments. For example, plt.subplot(223) works the same as plt.subplot(2, 2, 3), although it’s arguably less readable.

Now, let’s plot our sine waves as two separate stacked plots. The process will be to call the subplot() method and alter its active subplot argument to change the current subplot. For each current subplot, the plot() method will post the data specific to that subplot, as follows:

plt.subplot(2, 1, 1)
plt.plot(time, amplitude, label='sine1')
plt.legend(loc='upper right')

plt.subplot(2, 1, 2)
plt.ylim(-1, 1)
plt.plot(time, amplitude_halved, label='sine2')
plt.legend(loc='best')
Output of pyplot sine subplot program (by the author)
Output of pyplot sine subplot program (by the author)

Note that if you don’t set the y limits on the second plot, pyplot will automatically scale the graph so that the two subplots look identical. Because we manually set the scale on the second subplot with the ylim() method, it’s clear that the second sine wave has half the amplitude of the first.

That’s a speedy look at some of the syntax for the pyplot approach. Now let’s look at the object-oriented style.


Using the Object-Oriented Style

The object-oriented plotting style generally requires a bit more code than the previously described pyplot approach, but it lets you get the absolute most out of Matplotlib. By explicitly creating Figure and Axes objects, you’ll more easily control your plots, better understand interactions with other libraries, create plots with multiple x- and y-axes, and more.

Creating and Manipulating Plots with the Object-oriented Style

To become familiar with the object-oriented style, let’s re-create the catenary plot from earlier in the article. To demonstrate some of the style’s enhanced functionality, we’ll force the y-axis to pass through the center of the plot.

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-5, 5, 0.1)
y = np.cosh(x)

fig, ax = plt.subplots()

The previous code will create a single empty figure. To custom configure the plot, next call the Axes object’s set() method and pass it keyword arguments for a title, axis labels, and axis limits. The set() method is a convenience method that lets you set multiple properties at once rather than calling specific methods for each.

ax.set(title='A Catenary',
        xlabel='Horizontal Distance',
        ylabel='Height',
        xlim=(-8, 8.1),
        ylim=(0, 60))

Next, we’ll move the y-axis to the center of the chart instead of along the side. In Matplotlib, spines are the lines connecting the axis tick marks and marking the boundaries of the area containing the plotted data.

The default position for spines is around a plot with the ticks and labels along the left and bottom margins. But spines can also be placed at arbitrary positions. With the object-oriented style, we can accomplish this using the set_position() method of the Spine subclass.

The following code first moves the left (y) axis to the 0 value on the x-axis. Then, the line width is set to 2 so that the axis stands out a bit from the background grid that we’re going to use later.

ax.spines.left.set_position('zero')
ax.spines.left.set_linewidth(2)

The following line turns off the right boundary of the plot by setting its color to none:

ax.spines.right.set_color('none')

The next three lines repeat this overall process for the bottom axis and top axis, respectively:

ax.spines.bottom.set_position('zero')
ax.spines.bottom.set_linewidth(2)
ax.spines.top.set_color('none')

To finish the plot, we add a background grid and call the plot() method, passing it the x and y data and setting the line width to 3 and the color to firebrick:

ax.grid()
ax.plot(x, y, lw=3, color='firebrick')
The line plot of a catenary built using the object-oriented style (by the author)
The line plot of a catenary built using the object-oriented style (by the author)

If you omit the code related to the spines, you can reproduce the pyplot version of this figure with essentially the same amount of code. Thus, the verbosity of the object-oriented style has much to do with the fact that you can do more with it, and people generally take advantage of this.

Methods available in the pyplot approach have an equivalent in the object-oriented style. Unfortunately, the method names are often different. For example, title() in pyplot becomes set_title(), and xticks() becomes set_xticks(). This is one reason why it’s good to pick one approach and stick with it.

Some of the more common methods for making object-oriented plots are listed in the table that follows. You can find additional methods, such as for making box plots, violin plots, and more, in the index of plot types and in the Matplotlib gallery.

Some useful object-oriented methods for creating plots (Source: Python Tools for Scientists [1])
Some useful object-oriented methods for creating plots (Source: Python Tools for Scientists [1])

Common methods for working with Figure and Axes objects are listed in the following tables. In many cases, these work like the pyplot methods, though the method names might be different.

Some useful object-oriented methods for manipulating plots (Source: Python Tools for Scientists [1])
Some useful object-oriented methods for manipulating plots (Source: Python Tools for Scientists [1])
Some useful methods for working with Axes objects (Source: Python Tools for Scientists [1])
Some useful methods for working with Axes objects (Source: Python Tools for Scientists [1])

As mentioned in the pyplot section, the code examples in all these tables represent simple cases. Most methods take many arguments, letting you fine-tune your plots with respect to properties like font style and size, line widths and colors, rotation angles, exploded views, and so on. To learn more, visit the Matplotlib docs.

Working with Subplots

Like the pyplot approach, the object-oriented style supports the use of subplots. Although there are multiple ways to assign subplots to Figure and Axes objects, the plt.subplots() method is convenient and returns a NumPy array that lets you select subplots using standard indexing or with unique names such as axs[0, 0] or ax1. Another benefit is that you can preview the subplots’ geometry prior to plotting any data.

The object-oriented method for creating subplots is spelled subplots, whereas the pyplot approach uses subplot. You can remember this by associating the simplest technique, pyplot, with the shortest name.

Calling plt.subplots() with no arguments generates a single empty plot. Technically, this produces a 1×1 AxesSubplot object.

fig, ax = plt.subplots()
An empty subplot (by the author)
An empty subplot (by the author)

Producing multiple subplots works like the plt.subplot() method, only without an index argument for the active subplot. The first argument indicates the number of rows; the second specifies the number of columns. By convention, multiple Axes are given the plural name, axs, rather than axes so as to avoid confusion with a single instance of Axes.

Passing the plt.subplots() method two arguments lets you control the number of subplots and their geometry. The following code generates the 2×2 grid of subplots shown below and stores a list of two AxesSubplot objects in the axs variable.

fig, axs = plt.subplots(2, 2)
axs
A 2x2 grid of subplots (by the author)
A 2×2 grid of subplots (by the author)

To activate a subplot, you can use its index. In this example, we plot on the second subplot in the first row:

fig, axs = plt.subplots(2, 2)
axs[0, 1].plot([1, 2, 3])
A 2x2 grid of subplots with the second subplot active (by the author)
A 2×2 grid of subplots with the second subplot active (by the author)

Alternatively, you can name and store the subplots individually by using tuple unpacking for multiple Axes. Each row of subplots will need to be in its own tuple. You can then select a subplot using a name, versus a less-readable index:

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)
ax3.plot([1, 2, 3])
A 2x2 grid of subplots with the third subplot active (by the author)
A 2×2 grid of subplots with the third subplot active (by the author)

In both the pyplot approach and object-oriented style, you can add whitespace around the subplots by calling the tight_layout() method on the Figure object:

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)
ax3.plot([1, 2, 3])
fig.tight_layout()
The effect of the tight_layout() method on subplot spacing (by the author)
The effect of the tight_layout() method on subplot spacing (by the author)

Now the subplots don’t look so cramped. For the pyplot approach you would use plt.tight_layout().

Alternative Ways to Make Subplots

No matter which technique you use, there are higher-level alternatives to help you split a figure into a grid of subareas. This, in turn, helps you create subplots that have different widths and heights. The resulting multipaneled displays are useful for summarizing information in presentations and reports.

Among these paneling tools are Matplotlib’s [GridSpec](https://matplotlib.org/stable/api/_as_gen/matplotlib.gridspec.GridSpec.html) module and its subplot_mosaic() method. Here’s an example built with GridSpec:

A multipanel display built with GridSpec (Source: Python Tools for Scientists [1])
A multipanel display built with GridSpec (Source: Python Tools for Scientists [1])

To read more about these tools, visit _Working with Multiple Figures and Axes and Arranging Multiple Axes in a Figure_ in the Matplotlib docs, and my GridSpec tutorial article in Better Programming.


Summary

If you program in Python, you need to know Matplotlib. To know Matplotlib, you need to understand its primary plotting nomenclature and its two plotting interfaces.

The Figure object represents the canvas on which you plot. It controls things like the size of the plot, its aspect ratio, the padding between subplots, supertitles, and the ability to save the plot.

Figure objects can hold multiple Axes objects that form what we generally think of as figures or diagrams. These include lines, points, text, titles, the plot’s coordinate system, and so on. Multiple Axes objects in the same Figure object constitute subplots.

Within an Axes object, the Axis element represents numerical values on the x, y, or z axis, including tick marks, labels, and limits.

Matplotlib comes with two main approaches to making plots. The pyplot approach is designed for quick and easy plotting, such as for exploratory data analysis. With this approach, Figure and Axes objects are created behind the scenes and most decisions, such as for axis scaling, colors, line styles, etc., are made for you (though you can override these to a point).

For more involved plots, such as for reports and presentations, the object-oriented style explicitly creates Figure and Axes objects (by convention labeled as fig, ax). This provides you with more control and makes it easier to understand interactions with other Python libraries.

If you’re not aware of these two paradigms for plotting, it’s easy to get confused when using code snippets you find online, such as on Stack Overflow. Because the methods used with each approach are similar but different, the Matplotlib developers recommend that you pick one approach and use it consistently.

Citations

  1. "Python Tools for Scientists: An Introduction to Using Anaconda, JupyterLab, and Python’s Scientific Libraries" (No Starch Press, 2023) by Lee Vaughan.

Thanks!

Thanks for reading and follow me for more Quick Success Data Science articles in the future. And if you want to learn more about Matplotlib and Python’s other plotting libraries, check out my book, Python Tools for Scientists, available online and in fine bookstores like Barnes and Noble.

Python Tools for Scientists: An Introduction to Using Anaconda, JupyterLab, and Python’s Scientific…


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.