Intermediate Guide

Exploratory Data Analysis (EDA) on Satellite Imagery Using EarthPy

In this article, we are going to use EarthPy to handle the satellite imagery and to do Exploratory Data Analysis(EDA) effectively

Syam Kakarla
Towards Data Science
7 min readSep 23, 2020

--

Landsat Image by Jesse Allen and Robert Simmon, using data from the USGS Earth Explorer.

Table of Contents

  • Introduction to Satellite Imagery
  • Installation
  • How to Download Satellite Images
  • Exploratory Data Analysis(EDA) on Satellite Images
  • Final Thoughts
  • References

Let’s Get Started ✨

Introduction to Satellite Imagery

Satellite imagery has a wide range of applications which is incorporated in every aspect of human life. Especially remote sensing has evolved over the years to solve a lot of problems in different areas. In Remote Sensing, Hyperspectral remote sensors are widely used for monitoring the earth’s surface with the high spectral resolution.

Hyperspectral Image(HSI) data often contains hundreds of spectral bands over the same spatial area which provide valuable information to identify the various materials. In HSI, each pixel can be regarded as a high dimensional vector whose entries correspond to the spectral reflectance from visible to infrared.

Some of the best applications of remote sensing are Mineral Exploration, Defense Research, Bio-Chemical Composition, Forest Heath Status, Agriculture e.t.c. Use the below research paper to get better intuition on applications of Hyperspectral remote sensing.

Use the below article for the hands-on experience in the basic analysis of Hyperspectral Image Analysis using python.

Installation

Let’s look at the EarthPy, It is an open-source python package that makes it easier to plot and work with spatial raster and vector data using open source tools. Earthpy depends upon geopandas which has a focus on vector data and rasterio with facilitates input and output of raster data files. It also requires matplotlib for plotting operations. Use the below command to install EarthPy.

How to Download Satellite Images

There are 14 datasets available in the EarthpPy Package, let us see the datasets available to download.

Datasets — Image by Author

Let’s see how to download the available datasets. The method ‘get_data’ is used to download the data using the name of the dataset.

The output will be:

Image bt Author

EDA on Satellite Images

In this article, we use the ‘vignette Landsat’ dataset. This dataset contains Landsat 8 data for February 21, 2017, for an area surrounding the Cold Springs Fire boundary near Nederland, Colorado. It also contains the Cold Springs Fire boundary provided by GeoMAC. The Landsat bands have been cropped to both cover the Cold Springs Fire Boundary and include no data values at the southeast edge of the original image as well as cloud cover.

Let’s take a look at reading the dataset:

The bands are selected and stacked using the ‘stack’ method from the spatial module of EarthPy. The above code reads the data and prints the metadata. The output is shown below.

Meta Data of Landsat8 Dataset — Image by Author

The dataset has the shape (2158, 1941), 7 bands and contains 41,88,678 pixels.

Plot Bands

As we discussed, the Landsat8 dataset has 7 bands. Let’s plot the bands using the inbuilt method ‘plot_bands’ from the earthpy package. The plot_bands method takes the stack of the bands and plots along with custom titles which can be done by passing unique titles for each image as a list of titles using the title= parameter.

Let’s see the code to plot the bands of the Landsat8 dataset.

The resulted plot is shown below:

The plot of all Bands in landsat8 Dataset — Image by Author

RGB Composite Image

These hyperspectral images have multiple numbers of bands that contain the data ranging from visible to infrared. So it is hard to visualize the data for humans. So the creating an RGB Composite Image makes it easier to understand the data effectively. To plot RGB composite images, you will plot the red, green, and blue bands, which are bands 4, 3, and 2, respectively, in the image stack we created from the Landsat8 data. Python uses a zero-based index system, so you need to subtract a value of 1 from each index. Therefore, the index for the red band is 3, green is 2, and blue is 1. Let’s see the code to plot the RGB composite image.

The generated RGB composite image is shown below.

Image by Author

Stretching the Composite Image

The Composite images that we created can sometimes be dark if the pixel brightness values are skewed toward the value of zero. This type of problem can be solved by stretching the pixel brightness values in an image using the argument stretch=True to extend the values to the full 0-255 range of potential values to increase the visual contrast of the image. Also, the str_clip argument allows you to specify how much of the tails of the data that you want to clip off. The larger the number, the more the data will be stretched or brightened.

Let’s see how this can be done suing earthpy.

The RGB composite image after applying stretch is shown below:

Landsat8 Dataset RGB Composite Image After Applying Strech — Image by Author

Plotting Spectra

Let’s see how to plot spectra which helps us to understand the nature of the pixels. The below code to used to plot the 7 spectra of the first row of the Landsast8 dataset.

The output is shown below:

Spectra Plot — Image by Author

Histograms

Visualizing the bands of the hyperspectral image dataset helps us to understand the distribution of values of the bands. The hist method from the ‘eathpy.plot’ does the work by plotting the histograms for the bands of the dataset/stack that we created previously. We can also modify the column size, titles, colors of the individual histograms. Let’s see the code to plot the histograms.

The resulted plot is

Histogram Plot of Bands — Image by Author

Normalized Difference Vegetation Index (NDVI)

To determine the density of green on a patch of land, researchers must observe the distinct colors (wavelengths) of visible(VIS) and near-infrared (NIR)sunlight reflected by the plants. The Normalized Difference Vegetation Index (NDVI) quantifies vegetation by measuring the difference between near-infrared which vegetation strongly reflects and red light (which vegetation absorbs). NDVI always ranges from -1 to +1.

NDVI = (NIR — VIS)/(NIR + VIS)

For example, when you have negative values, it’s highly likely that it’s water. On the other hand, if you have an NDVI value close to +1, there’s a high possibility that it’s dense green leaves. But when NDVI is close to zero, there aren’t green leaves and it could even be an urbanized area.

The above code is used to calculate the Normalized Difference Vegetation Index(NDVI) and also display the generated image.

Landsat8 NDVI — Image by Author

Classification of NDVI

The Normalized Difference Vegetation Index(NDVI) results are categorized into useful classes based on the Hyperspectral Image data. The values under 0 will be classified together as no vegetation. Additional classes will be created for the bare area and low, moderate, and high vegetation areas. Let’s look at the code:

The above code classifies the NDVI and also plots the resulted data. The resulted image after classification is shown below.

Normalized Difference Vegetation Index (NDVI) Classes — Image by Author

Not but not the least, The code that I have written during the blog can be accessed from the below GitHub link or use the google collaboratory notebook.

Final Thoughts

This article covers the different ways to analyze the satellite/Hyperspectral Imagery using EarthPy but there is a lot more such as Dimensionality Reduction(DR) and Classification e.t.c. Use the below articles for detailed explanation and hands-on using python.

References

--

--