The world’s leading publication for data science, AI, and ML professionals.

Looking for patterns in satellite image time series with python

A quick guide for handling time-varying imagery with open python libraries

image by author
image by author

A while ago I wrote an article here covering some of the basics of remote sensing using python. Last time we covered the basics of Geopandas, rasterio and google earth engine to manipulate satellite imagery and geospatial data.

This time around, we’ll be expanding on the concept of calculating vegetation indexes (NDVI) by looking at how these behave over time and what visualizations we can make with it. We’ll start in a similar fashion, with a vector layer containing land parcels in which we are interested.

A couple of rural parcels to begin with

Here we’re looking at Capivari do Sul, in the southernmost portion of Brazil, a municipality known for its exports of rice cultures. We acquired a shapefile with all land parcels from Brazil’s CAR – a national environmental and agricultural registry – which we’ll in turn read with geopandas.

That blank spot in the middle is the urban settlement. Everything else is basically rice culture. (image by author)
That blank spot in the middle is the urban settlement. Everything else is basically rice culture. (image by author)

Before we can actually go fetch the satellite imagery, we’ll need to refine this vector dataset. The reason for this is that although the parcels reflect only the rural landscape of the municipality, each rural parcel actually has more than just crops: there are protected areas within each of them (thanks to the environmental legislation in place) and the houses in which people live, so we don’t want to calculate NDVI for these.

Accessing land-cover data from the cloud

To filter out everything that isn’t a cropfield, we’ll use the amazing MapBiomas project. MapBiomas is a Google Earth Engine based project that, among other things, provides land-cover classification derived from Landsat imagery for Brazil as a whole. This means we can access the files using just Earth Engine’s python API!

Mapbiomas is amazing, to be honest. (image by author)
Mapbiomas is amazing, to be honest. (image by author)

The tricky part here is finding the documentation and realizing that the MapBiomas project took a few liberties in the way they store their imagery. Every time the project’s team revise the land-cover classification algorithms, a new collection is made available (we’re currently at collection 6). The collection in turn has only one single image, with several bands, one for each year. So we need to use a band filter to be able to get the image for the year 2020 (the latest one available).

Values range from 0 to 33 because this is the number of land-cover classes designated by MapBiomas. (image by author)
Values range from 0 to 33 because this is the number of land-cover classes designated by MapBiomas. (image by author)

From raster to vector with rasterio

Once we have the image exported to our local environment (see the previous article for the code snippet), we need then to transform them into vectors by using rasterio features and isolate all the classes that belong to the agriculture category, like perennial crops and annual crops. Next we’ll want to find the intersection between these newly acquired shapes and the land parcels by using geopanda’s overlay function.

The result is the shape of the parcels minus everything that Mapbiomas didn't classify as agriculture. (image by author)
The result is the shape of the parcels minus everything that Mapbiomas didn’t classify as agriculture. (image by author)

Accessing the raw imagery for multiple timestamps

Great. Now that we have our parcels filtered out for just the crop fields, we can start working on the actual Remote Sensing bit. We’ll start by writing a function for filtering imagery by the metadata based on a calendar range, which will in turn be used with the Landsat 8 collection. The neat thing about this is that the function could be used with any other sensor, like Sentinel 2 for instance. The function basically filters by year, then month and finally computes the median value for each pixel in each timestamp. Once we get the Image Collection, we can take the list of images and export them to our drive. This way we can continue working on it locally.

All of the images we just downloaded, one for each month of the year. (image by author)
All of the images we just downloaded, one for each month of the year. (image by author)

Rasters to arrays and back again

With all of the images at hand, we can finally calculate the vegetation index, or NDVI. Last time we did so using Earth Engine – which is definitely simpler – , but here we’ll be using rasterio and numpy.

We’ll use a function that uses rasterio to isolate the bands we need – 4 for Red and 5 for NIR – and then convert the whole thing into a numpy array. The calculation for the normalized difference (NIR-Red/NIR+Red) is performed with numpy built-in functions and the result is written back into a new geoTiff.

The function can then be applied to all of the images in the folder to which Earth Engine exported them by using a super simple loop.

All resulting images with the vegetation index rendered in singleband pseudocolor. (image by author)
All resulting images with the vegetation index rendered in singleband pseudocolor. (image by author)

Zonal stats all throughout the year

Beautiful. Next we’ll just use rasterstats – much like last time – to compute the zonal statistics for each parcel. What we’re most interested in is the mean value of the vegetation index accross parcels, so we’re able to look at them separetely and statistically.

The dataframe with its new columns. (image by author)
The dataframe with its new columns. (image by author)

These new columns can now be used to plot the geodataframe much like before, but with the color-ramp ranging from 0 to 1. Let us also take it one step further and put all of the plots together in a gif using imageio.

Sort of like a beating heart, but it's rice. (image by author)
Sort of like a beating heart, but it’s rice. (image by author)

The plots are certainly interesting, but probably aren’t the best way to understand the dynamics of what is going on in these crop fields. So let’s take advantage of having the data stored in a dataframe and make a box plot for each month.

This tells a more complete story. (image by author)
This tells a more complete story. (image by author)

Now we get to see more clearly what’s happening. Since rice crops can be harvested every ~150 days, it makes sense that we’d see two "valleys" over the period of one year, when most of the lively green vegetation is removed from the field in order to be separated from the grains.

That’s a wrap

This is it! We managed to go from just a bunch of parcels to some pretty neat temporal charts.

There’s of course stuff that could have been included here – but weren’t for simplicity’s sake – , like masking the clouds or picking more than just one sensor, but this is the clean version. So far!

If you have questions or suggestions, don’t hesitate to drop me a line anytime. If you enjoyed this article, consider buying me a coffee so I keep writing more of those!

If you are not yet a Medium member and want to support writers like me, feel free to sign-up through my referral link:

Join Medium with my referral link – Guilherme M. Iablonovski


Related Articles