The world’s leading publication for data science, AI, and ML professionals.

Access Satellite Imagery with AWS and Google Colab

Load, Experiment, and Download Cloud Optimized Geotiffs (COG) using Python with Google Colab.

Photo by Dallas Reedy on Unsplash
Photo by Dallas Reedy on Unsplash

Accessing satellite imagery requires a lot of storage when you download it locally. The process can take up all your storage space if you need multiple plots over an extended period. Worse even is that you might need only a subset area of the whole image.

To deal with such problems, the recent advances in Geospatial Data use what we Cloud Optimized Geotiffs (COG).

A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need. – https://www.cogeo.org/

Now imagine being able to rewind, forward and stop a big video to watch the portion of the video you need. COG precisely allows you to do that with Satellite imagery. You can get a hole tile, visualise it with low resolution, subset and mask an area and even carry out any processing without even downloading it.

This tutorial will show you how you can access Landsat images stored in AWS s3 storage right in Google Colab using Python. The first part covers how you can find the right image for your area of interest, while the second part shows you how to access, visualise and process satellite image in Python.

Finding satellite Images for your Area of Interest

I receive a lot of inquiries on how to get the right images for an area of interest. In this case, we use Landsat Imagery, so I will guide you on how to obtain the path and row of the Satellite image. Go to USGS explorer website and find your place of interest by providing the Address or place ( Highlighted red in below picture). Once you get the results, click on the address. You can filter out by date if you want.

The next step is to click dataset and select your datasets. In this tutorial, we are using Landsat 8, so we choose it ( Highlighted red below)

Click on Additional criteria to filter out cloud coverage. As shown below image, we filter out cloud coverage to be less than 10%.

Finally, Click on results to find out all images available in your area. Once you know the image you would like to use for your analysis, you can just copy the ID or note it somewhere (As shown below).

You also need to note down the path and row of the satellite imagery. You can click on the metadata button ( Red rectangle) or simply note down the third part of the ID: 0030065.

In the next section, we see how we can access data directly in Google Colab using Python. The Landsat is stored in both AWS and Google Cloud Platform, but in this tutorial, we obtain the data with AWS.

Access Landsat with AWS and Google Colab

Let us first import the libraries we are going to use. Our primary tool is Rasterio which provides an easy to use API for processing satellite imagery.

import numpy as np
import matplotlib.pyplot as plt
import rasterio as rio
from rasterio.plot import show

You can construct your URL Path like this.

fpath = 'http://landsat-pds.s3.amazonaws.com/c1/L8/003/065/LC08_L1TP_003065_20190925_20191017_01_T1/LC08_L1TP_003065_20190925_20191017_01_T1_B4.TIF'

We can divide the above URL into different parts:

http://landsat-pds.s3.amazonaws.com/c1/

The first part of the URL is always the same and shows the AWS storage URL.

L8/003/065/

L8 means Landsat 8. The path: 003. The row: 065. This part of the URL will change depending on the dataset you are accessing and the area of interest.

LC08_L1TP_003065_20190925_20191017_01_T1

This part is the ID of the image which you can obtain as the image above shows.

LC08_L1TP_003065_20190925_20191017_01_T1_B4.TIF

Finally, you repeat the ID of the image but this time with Band you want to access (B4 here) and also the extension of the image (.TIFF)

Now, that we set up the access URL, we can write a simple function to open the image using Rasterio.

def rasterio_open(f):
return rio.open(f)
src_image = rasterio_open(fpath)

Once, we open the raster image; you can plot it using any Visualization library or only Rasterio’s data visualisation methods.

fig, ax = plt.subplots(1, figsize=(12, 10))
show(src_image, ax=ax)
plt.show()

Here is the whole image we have accessed displayed. We obtained only Band 4. We later see how we can combine different bands to construct an RGB image.

As you can see, we have NaN values which are displayed as black. We can get rid of those by using Numpy functionality. We first convert the image as Numpy Arrays by simply reading it with Rasterio

src_image_array = src_image.read(1)
src_image_array = src_image_array.astype("f4")
src_image_array

The image is converted to arrays, as shown below.

array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

Now, you can easily remove those null arrays by assigning them np.nan.

src_image_array[src_image_array==0] = np.nan
fig, ax = plt.subplots(1, figsize=(12, 10))
show(src_image_array, ax=ax)
plt.show()

We have no more null values at the edges now.

As it is often, you might be interested in only a subset part of the image. In the next section, we cover how to subset images using Rasterio’s Window functionality.

Subsetting Images

To only access a particular part of the image, you can filter out with rows, columns, width and height of the picture. Let us say, we do not want the whole image but an image with 750 X 850 (width and height) cutting off at 1200 cols and 1200 rows.

# Window(col_off, row_off, width, height)
window = rio.windows.Window(1200, 1200, 750, 850)
subset = src_image.read(1, window=window)
fig, ax = plt.subplots(1, figsize=(12, 10))
show(subset, ax=ax)
ax.set_axis_off()
plt.show()

Now the image is reduced to a subset, near the city of Eirunepe as shown below.

Finally, the next section shows how you can create an RGB image and download it locally.

Create RGB and Download

We access first each band by providing each with a separate URL. Note that URLs are all the same except the last part before the.TIF extension. These are the Bands, and in this case, since we want to create an RGB image, we access Band 4(Red), Band 3(Green) and Band 2 (Blue).

rpath = 'http://landsat-pds.s3.amazonaws.com/c1/L8/003/065/LC08_L1TP_003065_20190925_20191017_01_T1/LC08_L1TP_003065_20190925_20191017_01_T1_B4.TIF'
gpath = 'http://landsat-pds.s3.amazonaws.com/c1/L8/003/065/LC08_L1TP_003065_20190925_20191017_01_T1/LC08_L1TP_003065_20190925_20191017_01_T1_B3.TIF'
bpath = 'http://landsat-pds.s3.amazonaws.com/c1/L8/003/065/LC08_L1TP_003065_20190925_20191017_01_T1/LC08_L1TP_003065_20190925_20191017_01_T1_B2.TIF'
red = rio.open(rpath)
green = rio.open(gpath)
blue = rio.open(bpath)

We create an RGB image by opening an empty image and populating it with the three bands we opened above.

# Create an RGB image
with rio.open('RGB.tiff','w',driver='Gtiff', width=red.width, height=red.height,count=3,crs=red.crs,transform=red.transform, dtype=red.dtypes[0]) as rgb:
rgb.write(blue.read(1),1)
rgb.write(green.read(1),2)
rgb.write(red.read(1),3)
rgb.close()

The image is stored now. And you can open it with Rasterio or download it locally.

A subset of the image visualised with QGIS
A subset of the image visualised with QGIS

Conclusion

In this tutorial, we covered how you can look and find the right data for your area of interest. We have also seen how to access Landsat Images using Python and AWS storage.

The code for this tutorial is available at this Colab Notebook.

shakasom/rs-python-tutorials


Related Articles