The world’s leading publication for data science, AI, and ML professionals.

3 Features of Python Matplotlib That Make Data Visualizations More Appealing and Informative

How you show is just as important as what you show.

Photo by Sebastian Svenson on Unsplash
Photo by Sebastian Svenson on Unsplash

Data visualization is an essential part of Data Science. From exploratory data analysis to delivering results, it is imperative to make use of data visualizations.

It would be very tough to convince stakeholders just by showing them plain numbers. They are usually more interested in the big picture which can be shown clearly with the help of data visualization.

Python being the go-to language in the data science ecosystem has highly capable Data Visualization libraries and Matplotlib is one of them.

In this article, we will go over 3 features of Matplotlib that allow for customizing your plots to make them more informative and appealing.

At the end of the day, how you show is just as important as what you show.

I created a sample sales and discount dataset that you can download from my Github repo of datasets. Feel free to use any dataset in this repo but we will use the one called "sales_vs_discount.csv" for this article.

Let’s start by importing the libraries and reading the dataset. We will first create a Pandas data frame by reading the csv file.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("sales_vs_discount.csv")
df.head()
(image by author)
(image by author)

When it comes to analyzing time-series data, we almost always use line plots. Here is a simple line plot that shows how sales amount changes through time.

df["date"] = df["date"].astype("datetime64[ns]")
plt.figure(figsize=(12,6))
plt.plot(df["date"], df["sales_amount"])
(image by author)
(image by author)

Note: When plotting a quantity vs time, it is better to have the time information with an appropriate data type. In our file, the data type of the date column was the "object". In this first line, we changed it to "datetime".


We observe peaks in daily sales amounts and suspect that these peaks might be due to some discount applied to this product.

We can confirm it by plotting both the sales amount and discount rates together. If they overlap, we can confirm that discounts increased the sales amounts.

We can plot them on the same figure as below:

plt.figure(figsize=(12,6))
plt.plot(df["date"], df["sales_amount"])
plt.plot(df["date"], df["discount"])
(image by author)
(image by author)

We put both sales amount and discount on the same plot but this visualization does not tell us anything. The reason is that the value ranges are very different.

Discounts change between 0 and 1 whereas sales amounts are on the level of thousands.

We can solve this issue by adding a secondary axis.


1. Adding secondary y-axis

We can create multiple axes and add the secondary y-axis using the twinx function.

fig, ax1 = plt.subplots(figsize=(12,6))
ax2 = ax1.twinx()
ax1.plot(df["date"], df["sales_amount"])
ax2.plot(df["date"], df["discount"], color="r")
ax1.set_ylabel("sales amount", fontsize=18)
ax1.set_ylim([0,6000])
ax2.set_ylabel("discount rate", fontsize=18)
ax2.set_ylim([0,1])

Let’s go over this code line by line to understand what each step does.

The first line creates a Figure and Axes object. Figure object is like a container that holds everything together. We can put multiple Axes objects in (or on) a Figure.

The second line creates the second Axes object with a secondary y-axis.

The third and fourth lines create the line plots on the Axes objects.

The remaining part of the code creates labels for the y-axis and adjusts the value ranges.

(image by author)
(image by author)

It looks much better now. We can clearly see how discount has a positive impact on the sales amounts.


2. Xticks and yticks

This feature is for making the visualizations more appealing. Matplotlib allows for changing the xticks and yticks values as well as how they appear.

plt.figure(figsize=(12,6))
plt.plot(df["date"], df["sales_amount"])
plt.xticks(fontsize=14, rotation=45)
plt.yticks(ticks=np.arange(0,6000,500), fontsize=14)
plt.ylabel("Sales Amount", fontsize=18)

In the first line, we create a Figure object. The size of the figure is specified with the figsize parameter. The second line creates the line plot.

The third line changes the font size of xticks and rotates by 45 degrees. In the fourth line, we also change the tick values. We now have a tick at every 500 mark which was only on the thousand marks before.

The last line adds a label for the y-axis. Here is plot created by the code snippet above.

(image by author)
(image by author)

_Note: You may have noticed that we used the set_ylabel function in the previous example._

When working on Figure objects, these functions do not start with "set" such as ylabel, xlabel, ylim, and so on.

_When we are making adjustments on Axes objects, they start with "set" such as set_ylabel, set_xlabel, set_ylim, and so on._


3. Creating a subplot grid

In some cases, we create visualizations that contain multiple plots. Each one carries a different piece of information.

Recall the plot we created earlier that shows both the discount rates and sales amounts on the same graph.

A different version of that visualization could be having two line plots on top of each other. They share the same x-axis so we can still see the effect of discount on the sales amount.

The subplots function can be used for creating a grid of subplots. The nrows and ncols parameters determine the number of subplots and arrange them.

For instance, "nrows=2" and "ncols=2" create a grid that looks like this:

(image by author)
(image by author)

We will have two plots and put them on top of each other. Thus, we need to set "nrows=2" and "ncols=1".

fig, (ax1, ax2) = plt.subplots(
   nrows=2, ncols=1, sharex=True, figsize=(12,6)
)
fig.tight_layout(pad=2)
ax1.plot(df["date"], df["sales_amount"])
ax2.plot(df["date"], df["discount"])
ax1.set_ylabel("Sales", fontsize=18)
ax2.set_ylabel("Discount", fontsize=18)

If the x-axis is common, we can set the sharex parameter as true so the x-axis will be shown at the bottom only. Otherwise, there will be an x-axis below each subplot.

The tight_layout function is used for creating some space between the subplot. It prevents them from overlapping.

Here is our grid of two subplots.

(image by author)
(image by author)

Note: In this example, we explicitly define the names of subplots by passing them into a tuple.

  • (ax1, ax2)

We can also define subplots by using subscripts.

  • axs, axs[0], axs[1], and so on.

The code below creates the same plot as the one above.

fig, axs = plt.subplots(
   nrows=2, ncols=1, sharex=True, figsize=(12,6)
)
fig.tight_layout(pad=2)
axs[0].plot(df["date"], df["sales_amount"])
axs[1].plot(df["date"], df["discount"])
axs[0].set_ylabel("Sales", fontsize=18)
axs[1].set_ylabel("Discount", fontsize=18)

Conclusion

We have seen 3 different features of Matplotlib that allow for making the visualizations more appealing and informative.

It is one thing to create and use data visualizations in your tasks. What is more important is to make them easy-to-understand and concise. The examples we have done in this article will help you in creating such visualizations.


Last but not least, if you are not a Medium member yet and plan to become one, I kindly ask you to do so using the following link. I will receive a portion of your membership fee at no additional cost to you.

Join Medium with my referral link – Soner Yıldırım


Thank you for reading. Please let me know if you have any feedback.


Related Articles