Introduction
As the El Niño phenomenon intensifies in 2023, climatological and precipitation data have become fundamental in deciphering its impact on weather patterns and climate dynamics in global or regional scales. In terms of precipitation data, two globally recognized datasets come to the forefront: CHIRPS (Climate Hazards Group InfraRed Precipitation with Station) by USGS and IMERGE (Integrated Multi-satellitE Retrievals for GPM) developed by NASA, where GPM denotes the Global Precipitation Measurement mission. This mission employs a network of satellites to deliver comprehensive global rainfall estimates. Though these products are suitable for global models, they aren’t specifically tailored for South American scenarios.
In this context, the Brazilian National Institute for Space Research (INPE) offers daily precipitation raster data specifically calibrated for South America. This product, known as MERGE, relies on the IMERGE/GPM model but benefits from calibration against thousands of in-situ rain gauges to ensure unbiased results (Rozante et al. 2010, Rozante et al. 2020). INPE also provides additional climatological data, including monthly averages, daily averages, and more.
Figure 1 depicts the total precipitation in South America for 2015 (left), a year with a strong El Niño phenomenon, and the precipitation anomaly in comparison to the previous year when no El Niño was present (right).
We can note from the figure a large area with negative anomaly, specially in the Amazon biome, with up to 2,000 mm less rain when compared to the previous year.
These resources present immense value for diverse applications including watershed and reservoir management, monitoring of critical events, and precision agriculture. Nevertheless, the intricacies involved in downloading and manipulating these datasets often hinder their effective utilization, limiting their use mostly to meteorologists and leaving other professionals like hydrologists and agricultural specialists under-equipped. This was a challenge observed within my organization (ANA) where hydrologists and engineers often struggle to access rainfall data for specific basins.
Addressing this challenge, this article aims to guide readers on how to efficiently download and manipulate these data using the merge-downloader
package, opening the door for broader interdisciplinary usage and insights.
Installation
The merge-downloader
is an unofficial library developed to make it easier accessing data from INPE and the source-code is available at: https://github.com/cordmaur/merge-downloader.
The installation of the Python libraries required for geospatial applications can be daunting sometimes, so I strongly suggest using docker instead. I’ve already covered this topic in previous stories published here in TDS:
- Configuring a Minimal Docker Image for Spatial Analysis with Python
- Why You Should Use Devcontainers for Your Geospatial Development
A docker image is already available on Docker Hub and the installation can be done with the following commands in a shell prompt.
> docker pull cordmaur/merge-downloader:v1
> docker run -it -p 8888:8888 merge-downloader:v1 bash
Once inside the container, you can install the package and start jupyter, which will be accessible through your web browser on http://127.0.0.1:8888
.
root@89fd8c332f98:/# pip install merge-downloader
root@89fd8c332f98:/# jupyter notebook --ip=0.0.0.0 --allow-root --no-browser
Another option, even more straightforward is to install merge-downloader
on Google Colab, that will be the path followed here.
# from a code cell
%pip install merge-downloader
Downloading Assets
The first thing we need to cover is how to simply download precipitation and climatological assets from INPE. The list of available assets to download with merge-downloader
can be obtained with the following commands:
from mergedownloader.inpeparser import INPETypes
INPETypes.types()
result:
DAILY_RAIN,
MONTHLY_ACCUM_YEARLY,
DAILY_AVERAGE,
MONTHLY_ACCUM,
MONTHLY_ACCUM_MANUAL,
YEARLY_ACCUM,
HOURLY_WRF,
DAILY_WRF
The meaning of each type is available in the github documentation and summarized in the following table:
To download any asset, the first thing is to create a download instance, pointing to the INPE’s FTP server and setting a local folder where to download the files.
from mergedownloader.downloader import Downloader
from mergedownloader.inpeparser import INPETypes, INPEParsers
# create a temporary folder to store the files
!mkdir ./tmp
downloader = Downloader(
server=INPEParsers.FTPurl,
parsers=INPEParsers.parsers,
local_folder='./tmp'
)
Once a downloader instance is created, let’s download the the rain for one specific day. We can use get_file
command for that, like so:
import xarray as xr
file = downloader.get_file(date='20230601', datatype=INPETypes.DAILY_RAIN)
file
result:
PosixPath('tmp/DAILY_RAIN/MERGE_CPTEC_20230601.grib2')
The file can now be opened with xarray
library:
rain = xr.load_dataset(file)
rain['prec'].plot(vmax=150)
Opening Multiple Assets
Note that in the previous example, the longitude is ranging from 240 to 340 degrees east. That’s not the usual, when we use positive and negative numbers for longitudes to the right of Greenwich and left, respectively. This correction and other minor ones, such as correct CRS definition are done automatically when we open the assets using the Downloader
instance. That can be achieved by using open_file
instead of get_file
. As an example, let’s open multiple files representing the rain that occurred in the first four months of 2023. Additionally, we are going to plot the South American countries as a spatial reference.
# open the countries dataset
countries = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
south_america = countries.query("continent == 'South America'")
# select the months to download
dates = ['2023-01', '2023-02', '2023-03', '2023-04']
monthly_rains = [downloader.open_file(date, datatype=INPETypes.MONTHLY_ACCUM_YEARLY) for date in dates]
# create a figure with the monthly precipitation
fig, axs = plt.subplots(2, 2, figsize=(12, 11))
for i, rain in enumerate(monthly_rains):
ax = axs.reshape(-1)[i]
rain.plot(ax=ax, vmax=1200)
south_america.plot(ax=ax, facecolor='none', edgecolor='white')
Creating a Data Cube
Now, suppose we need to assess the accumulated precipitation that occurred in the first half of June 2023 in a specific area (e.g., Amazon biome). In these scenarios, instead of opening each file individualy, clipping the area, stacking them, etc. it’s much easier to create a data cube and operate directly on it. The cube consists of several rasters stacked alongside the time
dimension.
So, first, let’s create the cube. The Downloader
class can automatically create a cube for a given date range for us.
# create a cube for the first half of June
cube = downloader.create_cube(
start_date='20230601',
end_date='20230615',
datatype=INPETypes.DAILY_RAIN
)
cube
Next, we have to perform two operations. Clipping, to limit the data to the desired area, and summation, to accumulate the precipitation over the desired days. So, in the first step we will cut the cube to the extents of the Amazon biome. We can perform this through the GISUtil.cut_cube_by_geoms()
method. Then we sum along the time
axis, so we end up with a single 2-D layer. Let’s see it step-by-step.
from mergedownloader.utils import GISUtil
# open the amazon geometry
amazon = gpd.read_file('https://raw.githubusercontent.com/cordmaur/Fastai2-Medium/master/Data/amazon.geojson')
# cut the cube by the given geometry
amazon_cube = GISUtil.cut_cube_by_geoms(
cube=cube,
geometries = amazon.geometry
)
# accumulate the rain along the time axis
amazon_rain = amazon_cube.sum(dim='time', skipna=False)
# plot the figure
fig, ax = plt.subplots(figsize=(8, 5))
amazon_rain.plot(ax=ax)
south_america.plot(ax=ax, facecolor='none', edgecolor='firebrick')
Creating a Time Series
Creating a time series for a particular region can provide valuable insights, especially when considering the rainfall or historical climatology data. For instance, you might want to plot the monthly rain in the Amazon during the El Niño phenomenon in 2015 and compare it to the long-term average precipitation expected in the region for each month.
To get started, we are going to create two cubes. One with the monthly precipitation, from January to December 2015, and the other one with the long term averages. The long term average provided by INPE is calculated from 2000 to 2022 (23 years of data) and, in this case, we can passe any year as reference.
Note in the following code, that we are using the reducer=xr.DataArray.mean
that is the method used to aggregate the values from each pixel in the region, leaving only the time
dimension.
# Create the cubes
cube_2015 = downloader.create_cube(
start_date='2015-01',
end_date='2015-12',
datatype=INPETypes.MONTHLY_ACCUM_YEARLY
)
cube_lta = downloader.create_cube(
start_date='2015-01',
end_date='2015-12',
datatype=INPETypes.MONTHLY_ACCUM
)
# Create the series
series_2015 = downloader.get_time_series(
cube=cube_2015,
shp=amazon,
reducer=xr.DataArray.mean
)
series_lta = downloader.get_time_series(
cube=cube_lta,
shp=amazon,
reducer=xr.DataArray.mean
)
# create a string index with just year and month
series_lta.index = series_2015.index = series_2015.index.astype('str').str[:7]
# plot the graph
fig, ax = plt.subplots(figsize=(12,6))
series_lta.plot(ax=ax, kind='line', color='orange', marker='x')
series_2015.plot(ax=ax, kind='bar')
Conclusion
The merge-downloader
package and INPE’s precipitation and climatological data provide an effective resource for environmental analysis applications. The package’s compatibility with well-established libraries like geopandas and xarray further enhances its applicability.
Illustrated through various case examples, the package’s functionalities range from simple tasks such as downloading and plotting precipitation data, to more advanced operations. These include the generation of data cubes, implementation of spatial clipping, and execution of time-series analysis. Users can apply these tools according to their specific requirements, facilitating tasks such as environmental change tracking, climatic event monitoring, or comprehensive regional studies. Figure 2 shows an report example fully with merge-downloader
and other Python geospatial tools.
The presented methodology allows for the evaluation of precipitation data and its comparison with climatological references in any spatially defined area and can serve multiple domains .
Stay Connected
If you liked this article, consider becoming a Medium member and unlock thousands of articles like this one.