The world’s leading publication for data science, AI, and ML professionals.

Plotting Time Series Boxplots

Learn how to plot time series boxplots using matplotlib and Seaborn

Photo by Agê Barros on Unsplash
Photo by Agê Barros on Unsplash

A time series dataset is a collection of data that are time-indexed and collected over a period of time. Using a time series, you can plot interesting visualizations that illustrate the change in values of the subject under study over a period of time. One particular type of time series plots is time series boxplot.

A time series boxplot is a useful way to visualize your dataset when you have multiple data points in a specific time interval. For example, you collect the temperature of a location every hourly, over a period of one month. You want to use a boxplot to see how the mean temperature of each day changes, plus the dispersion of the temperatures for each day. This is where time series boxplot helps.

And so in this article, I will walk you through some of the basics of plotting a time series boxplot – from setting up a simple dataset using Pandas Series and DataFrame, to loading a real-life dataset, and show you how to plot time series boxplots based on your requirements.

Plotting the Time Series Boxplot using a Pandas Series

The first simple example I want to illustrate is how to plot using a Pandas Series. First, let’s create a DatetimeIndex object containing a range of dates:

import pandas as pd
import numpy as np
date_range = pd.date_range(start = "2022-01-01", 
                           end   = "2022-02-28 23:59:00",
                           freq  = "H")

Here, date_range is a DatetimeIndex object with start date 2022–01–01 00:00:00 to 2022–02–28 23:00:00. Notice the interval of 1 hour for each item (freq='H'):

DatetimeIndex(['2022-01-01 00:00:00', '2022-01-01 01:00:00',
               '2022-01-01 02:00:00', '2022-01-01 03:00:00',
               '2022-01-01 04:00:00', '2022-01-01 05:00:00',
               '2022-01-01 06:00:00', '2022-01-01 07:00:00',
               '2022-01-01 08:00:00', '2022-01-01 09:00:00',
               ...
               '2022-02-28 14:00:00', '2022-02-28 15:00:00',
               '2022-02-28 16:00:00', '2022-02-28 17:00:00',
               '2022-02-28 18:00:00', '2022-02-28 19:00:00',
               '2022-02-28 20:00:00', '2022-02-28 21:00:00',
               '2022-02-28 22:00:00', '2022-02-28 23:00:00'],
              dtype='datetime64[ns]', length=1416, freq='H')

You can now create a Pandas Series using the date_range variable as the index. For the value, let’s use a random number generator:

ts = pd.Series(list(np.random.randn(len(date_range))),
               index = date_range)

Here is the content of ts now:

2022-01-01 00:00:00   -0.869078
2022-01-01 01:00:00    1.742324
2022-01-01 02:00:00    0.937706
2022-01-01 03:00:00    0.366969
2022-01-01 04:00:00    1.841110
                         ...   
2022-02-28 19:00:00    0.061070
2022-02-28 20:00:00    0.354997
2022-02-28 21:00:00    1.102489
2022-02-28 22:00:00   -1.299513
2022-02-28 23:00:00   -0.452864
Freq: H, Length: 1416, dtype: float64

You are now ready to plot the time series boxplot using matplotlib and Seaborn:

import matplotlib.pyplot as plt
import seaborn
fig, ax = plt.subplots(figsize=(20,5))
seaborn.boxplot(x = ts.index.dayofyear, 
                y = ts, 
                ax = ax)

You will see the Time Series boxplot below:

Image by author
Image by author

Plotting the Time Series Boxplot using a Pandas DataFrame

The second example is create a DataFrame with the date_range object set as the index:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
date_range = pd.date_range(start = "2022-01-01", 
                           end   = "2022-02-28 23:59:00",
                           freq  = "H")
df = pd.DataFrame(
    {
        'temp':np.random.randn(len(date_range))
    }, index = date_range)
df

The dataframe look as follows:

To see the boxplots over each of the days in the 2 months, use the dayofyear attribute of the DatetimeIndex type (df.index):

fig, ax = plt.subplots(figsize=(20,5))
seaborn.boxplot(x = df.index.dayofyear, 
                y = df['temp'], 
                ax = ax)

You will see the plot as follows:

Image by author
Image by author

Plotting the Time Series Boxplot for Each Month

For the following example, I am going to load the data from a CSV file that you can obtained from the following URL: https://data.gov.sg/dataset/wet-bulb-temperature-hourly.

Click on the Download button to download the dataset
Click on the Download button to download the dataset

This data is subject to the Singapore Open Data Licence, available for review at https://data.gov.sg/open-data-licence.

The data contains the hourly wet bulb temperature recorded at the Changi Climate Station.

The Wet Bulb temperature is the temperature of adiabatic saturation. This is the temperature indicated by a moistened thermometer bulb exposed to the air flow. Wet Bulb temperature can be measured by using a thermometer with the bulb wrapped in wet muslin. Source: Temperatures – Dry Bulb/Web Bulb/Dew Pointhttps://www.weather.gov › zhu › dry_wet_bulb_definition

Let’s load the CSV file into a dataframe:

df = pd.read_csv('wet-bulb-temperature-hourly.csv', 
                 parse_dates = ['wbt_date'], 
                 index_col='wbt_date')
df

This is how the dataframe looks like:

Image by author
Image by author

The date of the data starts from 1982 to 2022. Let’s see how the temperature varies for each month. For this, we shall use the month attribute of the DatetimeIndex type:

fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(x = df.index.month,
                y = df['wet_bulb_temperature'], 
                ax = ax)

The plot is as follows:

Image by author
Image by author

As you can see, May (5) seems to be the hottest month for every year.

Plotting the Time Series Boxplot for Each Year

How about the hottest year? For this, we shall use the year attribute of the DatetimeIndex type:

fig, ax = plt.subplots(figsize=(24,10))
seaborn.boxplot(x = df.index.year,
                y = df['wet_bulb_temperature'], 
                ax = ax)

You can see that 1998 is the hottest year:

Image by author
Image by author

Observe that the x-ticks are a bit squeezy. Let’s rotate it 30 degrees:

fig, ax = plt.subplots(figsize=(24,10))
seaborn.boxplot(x = df.index.year,
                y = df['wet_bulb_temperature'], 
                ax = ax)
_ = ax.set_xticklabels(ax.get_xticklabels(), rotation = 30)

You can now see that the x-ticks are more spaced out:

Image by author
Image by author

Plotting the Time Series Boxplot for Each Day in a Specific Month

If you want to know the temperature for a particular month and year, say January 1982, you can perform a filter on the dataframe first before plotting, like this:

fig, ax = plt.subplots(figsize=(24,10))
seaborn.boxplot(
    x = df['1982-01-01':'1982-01-31'].index.day,
    y = df['1982-01-01':'1982-01-31']['wet_bulb_temperature'], 
    ax = ax)

Looks like 1 January 1982 is the hottest day in the month:

Image by author
Image by author

Plotting the Time Series Boxplot for Each Day of the Year

Finally, if you want to see the temperature readings for a particular year, say 1982, you can use the dayofyear attribute of the DatetimeIndex type:

fig, ax = plt.subplots(figsize=(150,10))
seaborn.boxplot(
    x = df['1982-01-01':'1982-12-31'].index.dayofyear,
    y = df['1982-01-01':'1982-12-31']['wet_bulb_temperature'], 
    ax = ax)
fig.savefig('temp.jpg')

Because the chart would be pretty big, I have used the savefig() function to save the chart to file. The chart looks like this:

If you were to zoom in, you will see that the x-ticks are just some running numbers:

Image by author
Image by author

You can format the x-ticks using the set_xticklabels() function:

fig, ax = plt.subplots(figsize=(150,10))
seaborn.boxplot(
    x = df['1982-01-01':'1982-12-31'].index.dayofyear,
    y = df['1982-01-01':'1982-12-31']['wet_bulb_temperature'], 
    ax = ax)
ax.set_xticklabels(labels = 
    df['1982-01-01':'1982-12-31'].index.strftime(
        '%Y-%m-%d').sort_values().unique(), 
    rotation=45, ha='right')
fig.savefig('temp.jpg')

The x-ticks would now be the actual dates:

Image by author
Image by author

If you want some fancy formatting for the dates, customize them using the strftime() function.

Summary

This is a quick overview of how you can create time series boxplots for data that are time-related. Try out the various combinations to create charts that you need!

Join Medium with my referral link – Wei-Meng Lee


Related Articles