
There have been a lot of Climate Change articles recently, and many feature distinctive stacked line charts that summarize data over many decades. Here’s an example from the _Climate Reanalyzer_ that shows how sea temperatures over the last year-and-a-half have been well above average [1]:
![Global (60S-60N) Sea Surface Temperature (1981–2023) [1]](https://towardsdatascience.com/wp-content/uploads/2023/08/1RBKlyqOkul6l6GLAIvWsAA.png)
And here’s a similar chart from Dr. Zachary Labe’s site showing the extent of Antarctic Sea ice over the last 40+ years [2]:
![Antarctic sea ice extent (1978–2023) [2]](https://towardsdatascience.com/wp-content/uploads/2023/08/1A8gTG93o1BIH140Rqw6dlA.png)
These charts have become a popular choice for infographics, such as in this article, but this popularity is a bit surprising [3]. Due to the difficulty in following individual lines through these dense, tangled displays, they’re generally shunned and disparaged as "spaghetti" plots.
But there’s a secret to using spaghetti plots successfully. You must emphasize one or two lines against a diminished background of all the other lines. This strategy lets you place the selected lines within an overall context. Do they represent normal outcomes or are they outliers? Are the results really good or really bad? By superimposing them on a background trend, the story can write itself.
In this Quick Success Data Science project, we’re going to produce a facsimile of the previous Antarctic Sea Ice chart with the Plotly Express graphing library. With this code example, you should be able to generate similar plots for your own datasets.
National Snow and Ice Data Center
For data we’ll use a comprehensive public dataset compiled by the National Snow and Ice Data Center, a part of the Cooperative Institute for Research in Environmental Sciences (CIRES) at the University of Colorado, Boulder [4]. This dataset utilizes satellite imagery to track and monitor changes in polar sea ice, such as the "halo" of ice around Antarctica.
![10 August 2023 sea ice extent based on satellite imagery [4]](https://towardsdatascience.com/wp-content/uploads/2023/08/1uZZnVxfPu_ekoFq8Acm8bQ.png)
The data comes in both monthly and daily increments. For the highest resolution possible, we’ll look at the daily data. For convenience, I’ve already downloaded the CSV file to this Gist. Additionally, a user guide can be found here.
Installing Libraries
Plotly Express is a high-level version of the Plotly graphing library that makes beautiful, highly interactive visualizations. You can install it with either conda or pip.
Here’s the conda installation:
conda install -c plotly plotly_express
And here’s the pip version:
pip install plotly
We’ll also need the pandas data analysis package. Install it with either:
conda install pandas
or:
pip install pandas
You may also need nbformat
for Plotly Express. Install it with either:
conda install -c conda-forge nbformat
or:
pip install nbformat
The Code
The following code was written in Jupyter Lab and is presented by cell.
Importing Libraries
Here are the imports. We’re using aliases for easier typing:
import pandas as pd
import plotly.graph_objects as go
import plotly.io as plt_io
import plotly.express as px
Normally, importing Plotly Express would be sufficient. Including Plotly’s graph_objects
module, however, gives us more customization options (think matplotlib vs. seaborn). The plotly.io
module will let us import Plotly’s ready-made design templates, saving us work.
Loading and Preparing the Data
The following commented code uses the pandas library to load the data from the Gist and prepare it for plotting. Part of this involves creating a new DataFrame column for the day of the year (January 1st = 1 and December 31st = 365 for non-leap years, 366 for leap years). We’ll use this Day of Year
column for the x-axis in our line plots.
# Read sea ice extent file:
URL = 'https://bit.ly/3OtPnnh'
df = pd.read_csv(URL, skiprows=[1])
df.columns = df.columns.str.strip() # Strip any leading white spaces.
df.drop(columns=['Missing', 'Source Data'], inplace=True)
# Combine date columns into a single datetime column:
df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day']])
# Extract the day of the year from the 'Date' column:
df['Day of Year'] = df['Date'].dt.dayofyear
# Move Date column to the far left:
column_to_move = df.pop("Date")
df.insert(0, "Date", column_to_move)
df.head(3)

Plot the Stacked Line Chart
The following commented code plots the stacked line chart for sea ice extent. The line for each year is first plotted in light gray. The years 2022 and 2023 are then plotted in black and red, respectively, and with a thicker line weight. The objective is to show how dramatically the ice has retreated in the last two years.
# Plot each year's extent data in a stacked line chart:
fig = px.line(df,
x='Day of Year',
y='Extent',
line_group='Year',
color='Year',
labels={'x': 'Month', 'y': 'Extent'},
title='Antarctic Sea Ice Extent January to December (1978-2023)',
template='plotly_white')
# Customize layout; tickvals represent starting 'day of year' of each month:
fig.update_layout(width=800,
height=650,
legend={'orientation': 'h'},
xaxis_title='',
yaxis_title='Sea Ice Extent (million sq km)',
xaxis={'tickmode':'array',
'tickvals': [1, 32, 60, 91, 121, 152,
182, 213, 244, 274, 305, 336],
'ticktext': ['Jan', 'Feb', 'Mar', 'Apr',
'May', 'Jun', 'Jul', 'Aug',
'Sep', 'Oct', 'Nov', 'Dec']})
# Draw border around the plot:
fig.update_xaxes(showline=True, linewidth=1, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=1, linecolor='black', mirror=True)
# Update trace styles to make all lines light gray:
fig.update_traces(line={'color': "lightgray", 'width': 0.75})
# Highlight selected years:
fig.update_traces(patch={'line': {'color': 'black', 'width': 2}},
selector={'legendgroup': '2022'})
fig.update_traces(patch={'line': {'color': 'red', 'width': 2}},
selector={'legendgroup': '2023'})
# Add annotation:
fig.add_annotation(dict(font=dict(color='darkgray',size=15),
x=85,
y=16,
showarrow=False,
text='All years 1978-2023----',
textangle=0,
xanchor='left'))
fig.show()

Wow, what a gorgeous graph! And with Plotly Express, the legend is "live." That means you can click on a year, and it will disappear from the plot. Double-click a year and all the other lines will disappear. What an easy way to untangle a spaghetti plot!
If you want to use a dark theme, just replace the template
argument with plotly_dark
in the call to the px.line()
method. You’ll also want to change the line color for the borders to white
in the fig.update_xaxes()
and fig.update_yaxes()
methods, and for the year 2022 in the first call to fig.update_traces()
. Here’s the result:

To turn these charts into persistent static images, just click the "camera" icon on the Plotly toolbar. This will save the chart as a PNG file.
Using a Fill Color
An alternative to showing most of the lines in a light color is to use a solid fill color between the lines for maximum and minimum extent. Here’s an attractive example from Zack Labe [2]:
![Antarctic sea ice anomalies (1979–2022) [2]](https://towardsdatascience.com/wp-content/uploads/2023/08/1E602XuxM9LFkn7XGh1AtXw.png)
Let’s try this approach with our previous plot. The first step is to calculate statistics on the Extent
column grouped by the day of the year (Day of Year
). For each daily grouping, pandas’ agg()
(aggregate) method will let us find the minimum and maximum values. We’ll save the results in a new DataFrame named bounds
.
# Calculate minimum and maximum bounds of "Extent" for each day of the year:
bounds = df.groupby('Day of Year')['Extent'].agg(['min', 'max']).reset_index()
bounds.rename(columns={'min': 'Min Extent', 'max': 'Max Extent'}, inplace=True)
bounds.head(3)

If you look at the original CSV data file, you’ll see that the data is collected every other day. Over the years, every day of the year gets sampled, but not for any given year.

Because the extent value for adjacent days can come from different years, we can end up with "jittery" curves. This seems to be a bigger problem with the minimum extent values:

Smoothing the Minimum Extent Curve
To smooth out this jagged curve, we just need to take the 2-day _moving average_ of the data. This involves calling the pandas rolling()
method on the column, passing it 2
, and then calling the mean()
method. Because there’s no data before the first row, it will be assigned a NaN
value, so we’ll drop it from the DataFrame.
# Smooth the "Min Extent" using a 2-day simple moving average (SMA):
bounds['Min SMA2'] = bounds['Min Extent'].rolling(2).mean()
bounds = bounds.iloc[1:] # Remove first row with NaN for SMA2.
bounds.head(3)

Highlighting the Last Two Years
To easily highlight the sea ice extent for 2022 and 2023 on the new plot, we’ll filter our original DataFrame (df
) to make two new DataFrames.
# Filter data for plotting specific years:
df_2022 = df[(df['Year'] >= 2022) & (df['Year'] < 2023)].copy().reset_index()
df_2023 = df[df['Year'] >= 2023].copy().reset_index()
Plotting the Filled Chart
The following commented code generates the filled line chart. Because Plotly’s plotly_dark
template isn’t truly black, the first steps create a custom template where all the elements are black. This type of control is more easily done using full Plotly, rather than the higher-level Plotly Express package.
Next, we’ll use Plotly’s go.Scatter()
method and pass it arguments for how to fill beneath the curves. For the upper Max Extent
curve, we’ll use the tonexty
argument to fill the area beneath the curve in dark gray. Then, for the smoothed Min SMA2
curve, we’ll use the tozeroy
argument to fill beneath it with black, overwriting the previous dark gray.
# Load the dark template:
plt_io.templates["custom_dark"] = plt_io.templates["plotly_dark"]
# Customize the template using all black background colors:
plt_io.templates["custom_dark"]['layout']['paper_bgcolor'] = '#000000'
plt_io.templates["custom_dark"]['layout']['plot_bgcolor'] = '#000000'
# Customize gridline colors:
plt_io.templates['custom_dark']['layout']['yaxis']['gridcolor'] = '#000000'
plt_io.templates['custom_dark']['layout']['xaxis']['gridcolor'] = '#000000'
# Create a figure:
fig = go.Figure()
# Add filled area traces for max and min extents:
fig.add_trace(go.Scatter(x=bounds['Day of Year'], y=bounds['Max Extent'],
fill='tonexty', fillcolor='darkgray',
line=dict(color='lightgrey', width=0.75)))
fig.add_trace(go.Scatter(x=bounds['Day of Year'], y=bounds['Min SMA2'],
fill='tozeroy', fillcolor='black',
line=dict(color='lightgrey', width=0.75)))
# Add traces for 2022 and 2023
fig.add_trace(go.Scatter(x=df_2022['Day of Year'], y=df_2022['Extent'],
mode='lines',
marker=dict(color='white', size=4),
name='2022'))
fig.add_trace(go.Scatter(x=df_2023['Day of Year'], y=df_2023['Extent'],
mode='lines',
marker=dict(color='red', size=4),
name='2023'))
# Update layout
fig.update_layout(
width=800,
height=650,
template='custom_dark',
title=dict(text='Antarctic Sea Ice Extent (1978-2023)',
font=dict(size=30)),
showlegend=False,
xaxis_title='Month',
yaxis_title='Sea Ice Extent (million sq km)',
xaxis=dict(tickmode='array',
tickvals=[1, 32, 60, 91, 121, 152, 182, 213, 244, 274, 305, 336],
ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug',
'Sep', 'Oct', 'Nov', 'Dec']))
# Update x and y axes properties:
fig.update_xaxes(showgrid=False,
ticks="outside",
tickson="boundaries",
ticklen=5)
fig.update_yaxes(showgrid=False,
ticks="outside",
tickson="boundaries",
ticklen=20)
# Add annotations for 2022 and 2023:
fig.add_annotation(dict(font=dict(color='white', size=15),
x=368, y=5.0,
showarrow=False,
text="2022",
textangle=0,
xanchor='left'))
fig.add_annotation(dict(font=dict(color='red', size=15),
x=220, y=15,
showarrow=False,
text="2023",
textangle=0,
xanchor='left'))
fig.show()

Notice how far the gray fill is "pulled down" to connect with the final 2023 data point. This demonstrates what a huge impact 2023 is having on the sea ice record.
Plotting Standard Deviations with the Mean
Many published sea ice charts include the mean and 2x the standard deviation for the 20-year period 1981–2010. With the pandas agg()
method, it’s easy to go back, filter the original DataFrame to these years, and then regenerate the bounds
DataFrame to include columns for the mean and 2x the standard deviation. Here’s the result:

For Gaussian distributions, two standard deviations encompass over 95% of all samples. This really emphasizes the extreme nature of the last two years.
Summary
In this article, we reproduced the stacked line chart technique popular for climate data visualizations. Not only did the Plotly library make this easy, but it also produced interactive digital graphs that can be saved as static images.
While spaghetti plots are often disparaged for their busy nature, if used correctly, they can tell a strong story. In this case, we used the emphasis technique to highlight a few lines while diminishing all the others.
For other strategies for making sense of spaghetti plots, visit the story-telling-with-data site. And to make similar plots using the matplotlib library, check out IceVarFigs/README.md at master · zmlabe/IceVarFigs (github.com)
Citations
- Birkel, S.D., "Daily Sea Surface Temperature," Climate Reanalyzer (https://ClimateReanalyzer.org), Climate Change Institute, University of Maine, USA. Accessed on August 13, 2023. (Climate Reanalyzer content is licensed under a Creative Commons Attribution 4.0 International License).
- Labe, Zachary, 2023, "Antarctic Sea Ice," Climate Visualizations, (https://zacklabe.com/), Princeton University and NOAA GFDL. Accessed on August 13, 2023. (Content is licensed under a Creative Commons Attribution 4.0 International License).
- Readfern, Graham, July 28, 2023, "’Something Weird is Going On’: Search for Answers as Antarctic Sea Ice Stays at Historic Lows," (https://theguardian.com).
- Fetterer, F., K. Knowles, W. N. Meier, M. Savoie, and A. K. Windnagel. Sea Ice Index, Version 3. 2017, Distributed by National Snow and Ice Data Center. https://doi.org/10.7265/N5K072F8. Date Accessed 08–09–2023.
Thanks!
Thanks for reading and please follow me for more Quick Success Data Science projects in the future.