The world’s leading publication for data science, AI, and ML professionals.

Creating Scientific Plots the Easy Way With scienceplots and matplotlib

Instantly Transform Your Matplotlib Figures With a Few Lines of Python Code

Photo by Braňo on Unsplash
Photo by Braňo on Unsplash

When writing articles for publication in academic journals, the layout and style of the figures are expected to conform to a predefined format. This ensures consistency across all of that publication’s articles and that any included figures are high quality when printed.

Python is widely used within the scientific community and provides a great way to create scientific plots. However, when we use Matplotlib, one of the most popular plotting libraries within Python, the default plots are poor and need adjusting to ensure they meet the requirements.

Changing the styles of matplotlib figures can be time-consuming, which is where the scienceplots library comes in handy. With just a few lines of code, we can instantly transform the way our figure looks without spending too much time working out how to change different parts of our figures.

The scienceplots library allows users to create simple, informative plots similar to those found in academic journals and research papers. Not only that, it also sets the required DPI to 600 (for some styles), which is often a requirement by publications to ensure high-quality printed figures.

The scienceplots library contains numerous styles, including support for multiple languages, including Chinese and Japanese. You can explore the full range of styles within the scienceplots library at the link below.

Gallery

Within this article, we will explore how we can transform some basic and common data visualisations into something that can be included in a scientific publication.

Setting Up scienceplots

Before creating plots with the scienceplots library, you need to ensure that you have LaTeX installed on your computer. LaTeX is a typesetting system that is designed for the creation of technical and scientific documentation.

If you do not already have LaTeX installed on your machine, you can find more details about LaTeX and how to install it [here](https://github.com/garrettj403/SciencePlots/wiki/FAQ) and here.

If you are running on Google Colab, you can run the following code in a cell to install LaTeX.

!sudo apt-get install dvipng texlive-latex-extra texlive-fonts-recommended texlive-latex-recommended cm-super

After setting up LaTeX, we can install the scienceplots library using pip:

pip install SciencePlots

Once the library and LaTeX have been installed on your chosen platform, you can then import the scienceplots library along with matplotlib.

import scienceplots
import matplotlib.pyplot as plt

Creating Dummy Data for Plotting

Before generating some plots, we first need to create some sample data. We will see how the scienceplots library works with real-world data later in the article.

For this part of the article, we are going to create some linearly spaced values using np.linspace and then carry out a few random mathematical calculations on that data.

# Generate x values
x = np.linspace(0, 10, 20)

# Generate y values with random noise
y = np.sin(x)
y2 = np.cos(x)
y3 = y2 * 1.5

Once we have created our data (or loaded it into pandas if we are loading from a csv file), we can begin creating our plots.

Creating a Line Plot With Markers Using Matplotlib

The first plot we will work with is a line plot. This can easily be created by using matplotlib’s .plot() function and passing in the required data for the x and y parameters.

As we are dealing with variables derived from equations, it can sometimes be handy to include these in the plot’s legend for the reader to understand what they are.

One of the nice things about matplotlib is we can use LaTeX equations as labels. All we have to do is surround the equation with dollar signs ( $ ).

plt.figure(figsize = (6,6))
plt.plot(x, y, marker='o', label='$y=sin(x)$')
plt.plot(x, y2, marker='o', label='$y=cos(x)$')
plt.plot(x, y3, marker='o', label='$y=y2*1.5$')

plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

When we run the above code, we get back the following and very basic matplotlib figure with standard colours.

Basic matplotlib line plot before applying scienceplots. Image by the author.
Basic matplotlib line plot before applying scienceplots. Image by the author.

Even though the figure above looks useable, the quality of it (dpi & size), and the styling may not be entirely suitable for publication within a journal.

Applying scienceplots Styling to a Line Plot

To instantly transform our figure, we can add a single line of code: a with statement, which calls upon matplotlib’s style.context function and allows us to pass in one of the many styles that are available from scienceplots.

with plt.style.context(['science', 'high-vis']):
    plt.figure(figsize = (6,6))
    plt.plot(x, y, marker='o', label='$y=sin(x)$')
    plt.plot(x, y2, marker='o', label='$y=cos(x)$')
    plt.plot(x, y3, marker='o', label='$y=y2*1.5$')
    plt.xlabel('X Variable (mm)')
    plt.ylabel('Y Variable')
    plt.legend()
    plt.show()

When we run the above code, we get the following plot, which is much more suitable for including in a journal publication.

Matplotlib lineplot after applying the scienceplots style. Image by the author.
Matplotlib lineplot after applying the scienceplots style. Image by the author.

The figure is simple (i.e. without chart junk), and it is easy to distinguish between the different lines. Additionally, when viewing this figure in a Jupyter Notebook, it may appear very large even though we have set a relatively small figure size. This is due to the figure’s DPI being set to 600, which is often a requirement of many publications and ensures that figures are as clear as possible.

Let’s try applying another style. This time we will use the styling from the Institute of Electrical and Electronics Engineers (IEEE).

To do this, all we have to do to change the styling is swap out high-vis for ieee.

with plt.style.context(['science', 'ieee']):
    plt.figure(figsize = (6,6))
    plt.plot(x, y, marker='o', label='$y=sin(x)$')
    plt.plot(x, y2, marker='o', label='$y=cos(x)$')
    plt.plot(x, y3, marker='o', label='$y=y2*1.5$')
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.legend()
    plt.show()

When we run the above code, we will get back the following plot in the style recommended by the IEEE.

Matplotlib lineplot after applying the scienceplots IEEE style. Image by the author.
Matplotlib lineplot after applying the scienceplots IEEE style. Image by the author.

Histograms with Science Plots

In the previous examples, we explored how to apply styling to line plots.

But can we apply the same styling to other types of plots?

Of course we can!

Let’s see how we can apply that styling to histograms.

First, let’s create a matplotlib figure using some Gamma Ray (measurement of natural radioactivity of geological formations) data using the code below. To show a second dataset, I have adjusted the same data by 20 API units to the right.

plt.figure(figsize = (6,6))
plt.hist(df['GR'], bins=100, label='GR1', alpha =0.5)
plt.hist(df['GR']+20, bins=100, label='GR2', alpha=0.5)
plt.xlim(0, 150)
plt.xlabel('Gamma Ray')
plt.ylabel('Frequency')
plt.legend()
plt.show()

When we run the above code, we get back the following figure.

Simple matplotlib histogram of gamma ray measurements. Image by the author.
Simple matplotlib histogram of gamma ray measurements. Image by the author.

We will notice it uses the standard styling from matplotlib and looks very basic with both data sets overlapping each other. This causes some of the information to be obscured.

Let’s see how the IEEE style changes things.

with plt.style.context(['science', 'ieee']):
    plt.figure(figsize = (6,6))
    plt.hist(df['GR'], bins=100, label='GR1')
    plt.hist(df['GR']+20, bins=100, label='GR2')
    plt.xlim(0, 150)
    plt.xlabel('Gamma Ray')
    plt.ylabel('Frequency')
    plt.legend()
    plt.show()

When we run the above code, we get back the following figure with the IEEE styling applied. However, the second GR dataset still obscures the first.

Matplotlib histogram of Gamma Ray measurements after applying scienceplots IEEE styling. Image by the author.
Matplotlib histogram of Gamma Ray measurements after applying scienceplots IEEE styling. Image by the author.

Perhaps I had high expectations that the scienceplots library would be able to handle any overlap and apply transparency automatically.

However, it is not too much effort to apply this ourselves. All we need to do is add the alpha parameter for each dataset.

with plt.style.context(['science', 'ieee']):
    plt.figure(figsize = (6,6))
    plt.hist(df['GR'], bins=100, label='GR1', alpha=0.5)
    plt.hist(df['GR']+20, bins=100, label='GR2', alpha=0.5)
    plt.xlim(0, 150)
    plt.xlabel('Gamma Ray')
    plt.ylabel('Frequency')
    plt.legend()
    plt.show()

We get the following figure when we run the above code with the alpha changes.

Histogram of Gamma Ray measurements after applying scienceplots and adding transparency to the datasets. Image by the author.
Histogram of Gamma Ray measurements after applying scienceplots and adding transparency to the datasets. Image by the author.

Now we can see the variation in the bars for both data sets.

It is recommended you check the style guidelines for your intended publication to make sure that using transparencies is acceptable. In most cases, it should be, but it is worth checking.

Applying Science Plots to Seaborn Figures

We are not restricted to just applying styles from the scienceplots library to matplotlib figures. We can also apply them to Seaborn figures. This is due to Seaborn being based upon matplotlib code.

Sometimes when creating figures Seaborn provides more effortless ways to create some plots compared to matplotlib. For example, when we have a text-based categorical variable, we want to be able to plot that without having to add a separate scatter plot for each category.

In this example, we have some neutron porosity and bulk density data -common well logging measurements. For each measurement, we also have a lithology category.

This dataset originates from the Force 2020 Xeek Machine Learning Competition dataset. Details of which can be found at the end of the article.

To begin, we first need to import seaborn into our notebook.

import seaborn as sns

After importing the seaborn library, we can create our scatterplot using the following code.

plt.figure(figsize=(6, 6))
sns.scatterplot(data=df, x='NPHI', 
                y='RHOB', hue='LITH', s=10)

plt.ylabel('Bulk Density (RHOB) - g/cc')
plt.xlabel('Neutron Porosity (NPHI) - dec')
plt.ylim(2.8, 1.4)
plt.show()

When we run the above code, we get back the following scatter plot with our data coloured in by the different lithologies.

Basic neutron-density crossplot generated using Seaborn. Image by the author.
Basic neutron-density crossplot generated using Seaborn. Image by the author.

It looks ok. However, we need to ensure that the style is suitable for the intended journal and that the colours are accessible to all readers.

To apply our scienceplots styling, we can use the same syntax as before:

with plt.style.context(['science', 'ieee']):
    plt.figure(figsize=(10, 8))
    sns.scatterplot(data=df, x='NPHI', y='RHOB', hue='LITH', s=10)
    plt.ylim(2.8, 1.4)
    plt.title('RHOB vs NPHI Crossplot')
    plt.show()

When we run the above code, we get back the following plot with improved styling, including a new colour palette.

Seaborn Scatter plot, with scienceplots styling showing bulk density vs neutron porosity and coloured by lithology variations. Image by the author.
Seaborn Scatter plot, with scienceplots styling showing bulk density vs neutron porosity and coloured by lithology variations. Image by the author.

Choosing colour palettes for figures can be tricky and time-consuming; however, with some thought, it can make your figure more accessible to readers with vision-related problems.

If you want to find some tools to help you choose effective and accessible colour palettes check out the link below.

Additionally, some colours may not be distinguished easily when printing papers out in black and white. Therefore, it may be worth considering assigning different shapes to the categories. This is especially important when we have small datasets such as those obtained from laboratory processes.

Summary

Within this article, we have explored how we can quickly transform basic matplotlib figures into something that could easily be added to an article for scientific publication. These figures may still need further tweaking, but by using the scienceplots library, we can get most of the way there. Additionally, checking your preferred journal’s author toolkit is always recommended to ensure that the plots you create meet the required standards.


Dataset Used in this Tutorial

Training dataset used as part of a Machine Learning competition run by Xeek and FORCE 2020 (Bormann et al., 2020). This dataset is licensed under Creative Commons Attribution 4.0 International.

The full dataset can be accessed at the following link: https://doi.org/10.5281/zenodo.4351155.


Thanks for reading. Before you go, you should definitely subscribe to my content and get my articles in your inbox. You can do that here!

Secondly, you can get the full Medium experience and support thousands of other writers and me by signing up for a membership. It only costs you $5 a month, and you have full access to all of the fantastic Medium articles, as well as the chance to make money with your writing.

If you sign up using my link, you will support me directly with a portion of your fee, and it won’t cost you more. If you do so, thank you so much for your support.


Related Articles