Climate Data Science

Climate is What You Expect

Investigating the regularities in the climate system

Published in

Towards Data Science

6 min readMar 9, 2020

The climate system is the product of the complex interactions between its components — Atmosphere, Biosphere, Cryosphere, Hydrosphere and Lithosphere — driven by a considerable number of forcing mechanisms, like solar radiation and the concentration of Greenhouse Gases. Even though chaos is an inherent characteristic of this system, there are several regularities and levels of organization in the behavior of the climate variables.

A typical view in the Amazon. Photo by Rodrigo Kugnharski on Unsplash

Take, for example, the seasons of the year. If you live in the Northern Hemisphere, you’d certainly expect to see snow in the winter and a sunny summer. In the tropics, where the temperature does not fluctuate so wildly, the summers are rainy and cold, with dry and hot winters. All of this is summed up in a phrase that Mark Twain supposedly said:

‘Climate is what you expect, weather is what you get’.

A simple phrase, but enough to give you a profound insight into how things work in the climate system. Would you expect heavy snowfall in the middle of the Amazon rainforest? Not in this world. An analytical way to describe climate is to express the average conditions of its variables, such as rainfall and temperature, within at least 30 years. These are the famous climatological normals, and they tell you many important things about what you should expect of climate.

The Annual March of Precipitation

This tutorial focuses on the steps you could take to do your own climatological normal any particular region of interest. For this, you’ll use the gridded precipitation dataset from the Global Precipitation Climatology Centre (GPCC), which consists of a very common global dataset in climatological studies because of its high quality and its long timespan — from January 1901 to December 2013.

First, as usual, you need to import the packages:

import xarray as xr 
import proplot as plot 
import matplotlib.pyplot as pltfrom esmtools.stats import*

You already know that Matplotlib is the gold standard of visualization packages for each and every Python programmer, and since the Quick Introduction to CMIP6 you got contact with Xarray, for n-dimensional gridded data, and Proplot, the next big thing in data visualization. The new member is Esmtools, which is a package mostly designed for statistical analysis with complex climate models. The easiest way to install it is with Pip:

pip install esmtools

One of the many nice features of Xarray is the possibility to directly load a good deal of datasets without the need to download through OPeNDAP. Since all the data in the NOAA/PSD catalog can be accessed with OpenDAP, you can get the GPCC product for this tutorial easily.

# OPEenDAP url
url = 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/gpcc/full_v7/precip.mon.total.2.5x2.5.v7.nc'
# load dataset
dset = xr.open_dataset(url)
>>> dset

The GPCC monthly gridded precipitation dataset metadata.

How does this precipitation look on a map?

fig, ax = plot.subplots(axwidth=4.5, tight=True,
                        proj='robin', proj_kw={'lon_0': 180},)
# format options
ax.format(land=False, coast=True, innerborders=True, borders=True,
          labels=True, geogridlinewidth=0,)map1 = ax.contourf(dset['lon'], dset['lat'],
                   dset['precip'][0, :, :],
                   cmap='Dusk',
                   levels=plot.arange(0, 400, 50),
                   extend='both')ax.colorbar(map1, loc='b', shrink=0.5, extendrect=True)plt.show()

The first global Precipitation field of the GPCC dataset.

Setting the Area (and time) of Interest

In this tutorial, you’ll investigate the annual march of precipitation in the Amazon Basin. This cycle is regulated by the ups and downs of the South American Monsoon System (SAMS), characterizing the two very distinct wet and dry seasons. To focus on this particular region, Xarray allows the use of the flexible .sel() method. You’ll basically set a box region around the basin selecting the latitude and longitude for the particular period of 1981 to 2010, which is the most recent climatological period according to the World Meteorological Organization (WMO).

amazon_area = dset.sel(lat=slice(5, -22), lon=slice(285, 315),
                       time=slice('1981-01-01', '2010-12-01'))

For each and every dataset you find across your path, you have to take extra care to know exactly how the dimensions are described in it. For example, if the GPCC dataset named the Longitude dimension as longitude instead of lon, you’d have to sel the longitude dimension instead of lon. All of these small details are very relevant for your code, so always check the metadata if you’re not familiar with a particular dataset.

A common practice in climatological studies is the creation of an index, which is a time series that characterize the behavior of a particular phenomenon or variable in a particular region. Since you have already selected the area around the Amazon Basin, Esmtools allows you to make an index from gridded data in the most rigorous way by making a cosine-weighting on the regular grid. Putting the climatology phrasing aside, make the precipitation index with:

amazon_index = cos_weight(amazon_area['precip'])
>>> amazon_index.dims
('time',)

How does the annual cycle of rainfall in the Amazon look like?

fig, ax = plot.subplots(figsize=(5, 5), tight=True)
ax.plot(amazon_index['time'], amazon_index, color='Blue')# format options
ax.format(xlabel='Time', ylabel='Precipitation (mm/month)')plt.show()

The ups and downs of the precipitation cycle in the Amazon.

While this does justice to what an annual cycle should look like, it doesn’t reflect some important details about the behavior of rainfall in the Amazon. A good way to improve it is to make a simple monthly plot or something more refined, like a boxplot. Good news is that Proplot wraps around Matplotlib and allows you to easily make a boxplot:

# numpy trick: transform from a row vector to a matrix
amazon_index = np.reshape(np.array(amazon_index), (30, 12),
                          order='C')fig, ax = plot.subplots(figsize=(5, 5), tight=True, sharex=False)
# format options
ax.format(ylim=(0, 300))ax[0].boxplot(amazon_index, marker='x', fillcolor='azure',
              labels= months)plt.show()

The annual march of rainfall, neatly visualized.

The boxplots allow you to see how rainfall really behaves in the Amazon Basin. From January to mid-May there’s a wide wet season across the entire basin, driven mainly by the active phase of the SAMS and the southerly migration of the Intertropical Convergence Zone (ITCZ). However, a surprise for most people is that the rainforest undergoes normal periods of lack of rainfall because of its very distinct dry season, from June to mid-September. During this pronounced dry season, the fire season also starts and the agricultural border advances a little further into the Amazon as the years go by.

Final Words

The annual cycle of any climatological variable of interest is, most of the time, the first step towards more detailed and profound studies. However, this somewhat simple behavior is already enough to give you a good view of the natural behavior of temperature, precipitation and others. In a changing climate, it becomes extraordinarily important to know how climate fluctuates in a particular region or even the entire globe.

A good thing is that a wealth reliable datasets are available for a quick download, and Xarray even goes beyond by allowing cloud-based access to many of them. Other great packages, like Proplot and the new Esmtools, facilitate the visualization and statistical analysis of these often complicated gridded datasets. As a rule, the Jupyter Notebook for this tutorial is freely available in my Climate Data Science repository.