Automated mapping of sea surface temperature with shell scripting, R and Python

Sam McClatchie
Towards Data Science
7 min readJul 20, 2022

--

Sea surface Temperature off New Zealand. Source: JPL MUR MEaSUREs Project. 2015. GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis. Ver. 4.1. PO.DAAC, CA, USA. Dataset accessed [16 July 2022] at https://doi.org/10.5067/GHGMR-4FJ04 via NOAA ERDDAP https://upwell.pfeg.noaa.gov/erddap/griddap/nasa_jpl_28d8_bd66_24b1.html

There are many ways to map sea surface temperature, or SST, which is produced by a range of satellite sensors, and served on multiple different platforms. It can be difficult to determine which method to use, or which product to select from the wealth of options.

When I was a graduate student in oceanography in the early 1980s, if you wanted SST data you had to send a request to NASA, and they would send you a packet of CDs loaded with the data in multiple files. Those days are long gone. The latest trend is to avoid downloading the data at all. Now you can select and subset analysis-ready cloud-optimised (ARCO) data from a cloud-based server, process the data remotely using multiple processors on computers using code on a laptop, visualise the data on the same remote machines, and just download the finished product (see https://tinyurl.com/2p8s6y9u and https://tinyurl.com/yc2jne6d). These methods are extremely powerful, but they require a bit more expertise to implement.

In this contribution, I will illustrate a method that still downloads the data in NetCDF format and processes the maps locally. It is not an analysis-ready cloud-optimised method, but for many purposes that do not involve processing terabytes of data, this is still quite adequate and fast. It does require a fast Internet connection, a data plan where data downloads won’t exceed your quota, and a computer with adequate processing capacity.

The initial questions are which SST dataset to use, and where to obtain it conveniently. For the purpose of illustration, I selected a gap-free product merged from several satellite sensors that has been mapped to coordinates ready for projection. The Group for High Resolution Sea Surface Temperature version 4 Multiscale Ultrahigh Resolution, or GHRSST-MUR product, is a near realtime (1 day latency), level 4 (mapped), gap-free product merged from several radiometers and spectroradiometers as well as in situ data. It provides almost global coverage of cloud-free SST at 0.01 degree resolution (Chin et al. 2017).

These data are available in NetCDF-4 file format on the NASA Jet Propulsion Lab Physical Oceanography Distributed Active Archive Center (DAAC) or in ARCO Zarr format on the Amazon Web Server. To access the NetCDF data, it is more convenient to use the National Oceanic Atmospheric Administration Environmental Research Division Data Access Program (NOAA-ERDDAP). On ERDDAP the GHRSST-MUR final product can be accessed and subset using the ERDDAP Graphical User Interface (GUI), but a much better option is to use the OPenDAP-like facility built into ERDDAP to access subsets of the dataset in the desired format. The ERDDAP GUI is helpful to obtain the required URL in the correct format.

ERDDAP provides many examples of using the OPenDAP-like facility to obtain the required data from within different analysis programs such as R, Python, or Matlab. I find it simplest to use R to access and download the data through ERDDAP, but I prefer to plot the SST maps with Python. I use a bash shell script to keep everything together and automate the process of downloading the data and creating the maps.

There are a few tricks to get the automation to work. (1) Modify the shell environment to permit the Conda environment needed for the Python scripts to be activated from a shell script. (2) Use a shell script to run the R script, thels Python scripts, and to set a timer for execution.

Control code execution timing and sequence with shell scripting

I run the the Python code in a project environment called ‘geo_env’ on my computer, which needs to be activated prior to running the scripts. I installed Python using the Conda package manager, so in order to activate an environment from a shell script running on Linux it is necessary to modify ~/.bashrc by adding the edit after “#<<< conda initialize <<< “, shown below:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/smcc/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/home/smcc/anaconda3/etc/profile.d/conda.sh" ]; then
. "/home/smcc/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/home/smcc/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
# edit to permit conda activate to work in a shell script
export -f conda
export -f __conda_activate
export -f __conda_reactivate
export -f __conda_hashr
export -f __conda_exe
# end smcc edit

Activation of the project environment from the shell script can then be done using the following command:

conda activate geo_env

To run the shell script every day, or at whatever interval you choose, needs a timed job execution. You can use a cron job, but I prefer to use “at” to run the script once without cron and then call “at” recursively to reset the job each time the shell script runs. This command sets the run time for the shell script to noon each day:

at -f "./shell/download_latest_GHRSST_data_from_ERDdap.sh" 12:00 + 24 hours

You can check the timing of the next run using:

at -l

Here is the full shell script:

#!/bin/bash
############################################ download_latest_GHRSST_data_from_ERDdap.sh
###########################################
# activate the conda environment for python
# NOTE: ~/.bashrc was modified so conda activate will work in a shell script
conda activate geo_env
##########################################
# Download the latest GHRSST data for each of the regions
# from ERDDAP.
# NOTE: Edit the R script to add more regions
##########################################
Rscript "../R/download_GHRSST_MUR-JPL-L4-GLOB-v4.1_daily.R"
##########################################
# Process the downloaded data and draw maps
###########################################
python "../python/batch_process_sst_and_plots.py"
##########################################
# Use "at" to run the script ONE TIME without cron# and then call "at" recursively to reset the job each time
# the shell script runs.
# NOTE: before resetting the at commands, use
# atq | cut -f 1 | xargs atrm# to remove all previous jobs
# (which can be shown with: at -l)
########################################
at -f "../shell/download_latest_GHRSST_data_from_ERDdap.sh" 12:00 + 24 hours

Download the data from ERDDAP with R

Downloading the data is straightforward with a simple R script:

#####################################################
# download GHRSST, MUR-JPL-L4-GLOB-v4.1 daily
#####################################################
setwd("/mnt/data/dynamic_data/projects/projects2022/GHRSST_and_anomalies/R")
start_date = paste(Sys.Date()-2, "T12:00:00Z", sep='') # just one day
end_date = paste(Sys.Date()-2, "T12:00:00Z", sep='') # today minus 2 days
#####################################################
# New Zealand
url <- paste("https://upwell.pfeg.noaa.gov/erddap/griddap/nasa_jpl_dde5_3be1_897b.nc?analysed_sst%5B(",start_date,"):(",end_date,")%5D%5B(-49.0):(-31.0)%5D%5B(160.5):(179.5)%5D&.draw=surface&.vars=longitude%7Clatitude%7Canalysed_sst&.colorBar=KT_thermal%7CD%7CLinear%7C15%7C18%7C6&.land=over&.bgColor=0xffccccff", sep="")
# filename <-"../data/latest_MUR_SST_New_Zealand.nc"
download.file(url, filename)
# end New Zealand
#####################################################
# California
url <-paste("https://upwell.pfeg.noaa.gov/erddap/griddap/nasa_jpl_dde5_3be1_897b.nc?analysed_sst%5B(",start_date,"):(",end_date,")%5D%5B(30.0):(45.0)%5D%5B(-130.0):(-115.0)%5D&.draw=surface&.vars=longitude%7Clatitude%7Canalysed_sst&.colorBar=KT_thermal%7CD%7CLinear%7C15%7C18%7C6&.land=over&.bgColor=0xffccccff", sep="")
filename <-"../data/latest_MUR_SST_SoCal.nc"
download.file(url, filename)
# end California

I’ve just used New Zealand and California for this example. More regions can be added by modifying the coordinates in additional URL commands. I find it convenient to use the ERDDAP GUI to generate the URL used in the R script.

Plotting the maps with Python

I use a simple python script to define the regions of interest and then plot them.

# batch_process_sst_and_plots.py
################################################################
# All regions are processed in a loop
#################################################################
# define regions to process# region_list =["New_Zealand_Northland","New_Zealand_Taranaki","New_Zealand_East_Cape","Australia_QLD_Brisbane","Australia_QLD_central","Australia_QLD_northern","Australia_NSW_Sydney","Australia_Tasmania","Costa_Rica","Baja_Sur","California","Gulf_of_Maine","US_Central_Bight","Florida"]# to process one or more only ...
region_list = ["New_Zealand", "California"]
for region in region_list:
import plot_SST_map
plot_SST_map.plot_map(region)
Sea surface Temperature off California. Source: JPL MUR MEaSUREs Project. 2015. GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis. Ver. 4.1. PO.DAAC, CA, USA. Dataset accessed [17 July 2022] at https://doi.org/10.5067/GHGMR-4FJ04 via NOAA ERDDAP https://upwell.pfeg.noaa.gov/erddap/griddap/nasa_jpl_28d8_bd66_24b1.html

Here is the plotting script:

def plot_map(region):
#########################################################
# Import packages:
#########################################################
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from matplotlib.pyplot import tick_params
# from matplotlib.font_manager import get_fontconfig_fonts
from datetime import datetime
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cmocean as cmo
import numpy as np
import pandas as pd
import xarray as xr
import os
from pyproj import transform
########################################################
# Load data
# Open a netCDF data file using xarray
########################################################
f = "latest_MUR_SST_" + region + ".nc"
datadir = ("/mnt/data/dynamic_data/projects/projects2022/GHRSST_and_anomalies/data")
d = datadir + "/" + f
# open netcdf file as an xarray dataset
ds = xr.open_dataset(d)
# open single variables in the xarray dataset
# sst_3d = ds["analysed_sst"] + 273.15
sst_3d = ds["analysed_sst"]
#reduce dimensions to 2d
sst = sst_3d.squeeze()
# Generate figure (set its size (width, height) in inches).
fig = plt.figure(figsize=(5, 8))
ax = plt.subplot(1, 1, 1, projection=ccrs.PlateCarree())
ax.coastlines(linewidths=0.5)
ax.add_feature(cfeature.LAND, facecolor="lightgray")
ax.add_feature(cfeature.BORDERS)
ax.add_feature(cfeature.RIVERS)
# get the min, max for SST, convert to a scalar, and round off
varmin = sst.min()
sstmin = int(round(varmin.item(),0))
varmax = sst.max()
sstmax = int(round(varmax.item(),0))
sst_int = int(sstmax - sstmin)
if sst_int >=6:
contour_levels = np.array(range(sstmin,sstmax,2))
elif sst_int <6:
contour_levels = np.array(range(sstmin,sstmax,1))
################################################
# Plot data
###############################################
pt_sst = sst.plot.contourf(ax=ax,
transform=ccrs.PlateCarree(),
levels=contour_levels,
cmap=cmo.cm.ice,
add_colorbar=False,
zorder=0,
)
# color bar ################
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
cax = inset_axes(ax,
width="80%", # width = 80% of parent_bbox width
height="5%", # height : 5%
loc='upper right',
)
# axins1.gridlines(draw_labels=False)
cbar = plt.colorbar(pt_sst,
cax=cax,
orientation="horizontal",
extendrect=True,
)
cbar.ax.tick_params(labelsize=7, colors='salmon')
gl = ax.gridlines(draw_labels=True,
dms=True,
x_inline=False,
y_inline=False,
linewidth=0.25,
color="black",
alpha=0.25,
)
# Manipulate latitude and longitude gridline numbers and spacinggl.top_labels = False
gl.right_labels = False
if region == "New_Zealand":
gl.xlocator = mticker.FixedLocator([163, 167, 171, 175, 179])
gl.ylocator = mticker.FixedLocator([-48, -46, -44, -42, -40,
38, -36, -34, -32])
elif region == "California":
gl.xlocator = mticker.FixedLocator([-130, -128, -126,
-124, -122, -120, -118, -116])
gl.ylocator = mticker.FixedLocator([31, 33, 35, 37, 39, 41, 43])
gl.xlabel_style = {"rotation": 0, "size": 12}
gl.ylabel_style = {"rotation": 0, "size": 12}
########################################################
# Save the plot
########################################################
# tidy by removing any previous file
os.system("rm /mnt/data/dynamic_data/projects/projects2022/GHRSST_and_anomalies/figures/latest_MUR_SST_"+region+".png")
plt.savefig("/mnt/data/dynamic_data/projects/projects2022/GHRSST_and_anomalies/figures/latest_MUR_SST_"+region+".png",
format="png",
bbox_inches='tight',
pad_inches=0,
dpi=300
)

This article just shows one example of what can be done with these SST data, namely producing a daily map of ocean surface temperatures for regions of interest. Combining several complementary tools, such as shell scripting, R, and Python, makes this task easier. Using an IDE such as VS Code that can edit and debug both R and Python in the same project facilitates combining the languages to better effect.

Github

Code for this article is available on a Github repository.

References

Chin, T.M, J. Vazquez-Cuervo, and E.M. Armstrong. 2017. A multi-scale high-resolution analysis of global sea surface temperature, Remote Sensing of Environment , 200 . https://doi.org/10.1016/j.rse.2017.07.029

--

--

Fisheries oceanographer. Former lead for the California Cooperative Oceanic Fisheries Investigations program at NOAA (2007-2018). https://www.fishocean.info