The world’s leading publication for data science, AI, and ML professionals.

Autoencoders & their Application in Remote Sensing

Autoencoders are one of the simplest and most popular neural networks. There are many versions of them but the underlining principle and…

Autoencoders are one of the simplest and most popular neural networks. There are many versions of them but the underlining principle and essence of the network remains the same. In this article, we first touch upon a vanilla autoncoder – it’s architecture, equations and implementation. We follow this up with an interesting application of autoencoders in the field of remote sensing for feature extraction.

Photo by NASA on Unsplash
Photo by NASA on Unsplash

Understanding Autoencoders

An Autoencoder is a neural network that consists of an input layer, one or more hidden layer(s) and an output layer, pretty much like any other neural network that you may have heard of or worked with.

But there are some very specific characteristics that an autoencoder possesses which makes it unique and they are:

  1. The input and output layers of an autoencoder always have the same dimension. The hidden layer can be any dimension(preferably lesser than the input and output layers) but the input and the output dimensions should always match! Why is it so? Read on to find out!
  2. Another distinct feature of an autoencoder is that unlike other popularly used Neural Networks, it is unsupervised. This means that an autoencoder does not need label information and this makes them particularly useful for some tasks that need to learn features or extract representations

Architecture of an Autoencoder

A Vanilla Autoencoder. Image by author
A Vanilla Autoencoder. Image by author

The figure above represents the simplest autoencoder architecture possible. These autoencoders are called vanilla autoencoders and can be used to understand their underlying principles quite well.

As already mentioned in the characteristics, the network contains an input, a hidden and an output layer and the input and output dimensions are the same ie. 9 in this case. The reason for these dimensions to be strictly same can be justified by understanding how an autoencoder works.

Encoder:

Encoder in an Autoencoder. Image by author
Encoder in an Autoencoder. Image by author

The input portion of an autoencoder behaves like an encoder. That is, the use of a non linear activation function can encode the input data which gets stored in the hidden layer.

In form of equations, these can be represented as:

where:

x : input data

y : encoded values

b1 : input bias

W : input-to-hidden layer weights

f() : non-linear activation function, sigmoid in this case

Decoder:

Decoder in an Autoencoder. Image by author
Decoder in an Autoencoder. Image by author

The output portion of autoencoder decodes the information that has been stored in the hidden layer in encoded format. An autoencoder reconstructs the output that is as close as possible to the input (ideally same as input). This is the reason why the output and input dimensions must match.

In form of equations, these can be represented as:

where:

y : encoded value stored in the hidden layer

z : output of autoencoder

b2 : hidden layer bias

W` : transpose of input-to-hidden layer weights

f() : non-linear activation function

E(W,b) : mean square error cost function

The cost function E(W,b) depicts how different the reconstruction at output is from the input data. The aim of the autoencoder is to minimize the reconstruction error. At the onset, it might seem that autoencoder is a one-on-one operation that just matches output with input. However, it is during this process that the autoencoder might end up learning interesting features or representations from the input data which gets stored in the hidden layer. Choice of an appropriate non linear activation function plays an important role here. Also as the dimension of hidden layer is typically lesser than that of the input layer, it also leads to dimensionality reduction.

Backpropagation algorithm is used to reduce the cost function and the choice of cost function, whether mean square error or binary cross-entropy depends on the type of data and the application to be implemented. The same applies to the choice of the non linear activation function.

Implementation Details:

In this section, I attach some snippets of implementation of an autoencoder. Some important details of implementation of an autoencoder or of any neural network for that matter revolves around the forward and backward propagation respectively.

Dimensions of the vectors:

It is important to note that all the data that is dealt with is always a vector or a matrix. For the autoencoder implemented here, the dimensions of the data is as follows:

input data = [9×1]

output data = [9×1]

hidden layer = [5×1]

hidden layer bias = [5×1]

output layer bias = [9×1]

input-to-hidden layer weights = [5×9]

hidden-to-output layer weights = [9×5]

As a first step for implementation, we start by initializing the variables. We initialize the weight matrix and the bias vectors to random initial values.

To implement the feedforward flow of the autoencoder, ie. the calculation of the encoded weights, we simply multiply the input values with the randomly initialized weight matrix. To this we add the bias vector which too has been randomly initialized.

In order to reconstruct the output such that the it is a representation of the input, the error function must be reduced and backpropagation algorithm is used for this purpose. The main gist of the backpropagation algorithm is to adjust weights and biases individually. This is done using gradient descent and it can be represented as:

Gradient descent for weight
Gradient descent for weight
Gradient descent for bias
Gradient descent for bias

where eta is the learning rate and is one of the hyperparameters that is to be determined experimentally.

The discussion about a neural network cannot be complete without the mention of the hyperparameters. Here our hyperparameters are the dimension of the hidden layers, the learning rate, and momentum (if used). The number of epochs over which the model is trained is also important. These values can be determined experimentally such that the error goes on reducing.

In the next part of this article I will touch upon the application of this simple autoencoder in the field of Remote Sensing and discuss the results achieved.


While there are ample examples listing the use of autoencoders for de-noising or dimensionality reduction, in this article I would like to demonstrate how it can also be used in applications like remote-sensing to extract features from satellite images.

In this example, I used a vanilla autoencoder defined in the first part of this article to extract features from fully Polarimetric SAR (PolSAR) image taken over Oberpfaffenhofen, Wessling, Germany. The Oberpfaffenhofen PolSAR image of size 6640 x 1390 pixels has a resolution of 1.5 m per pixel and has been captured by E-SAR sensor (DLR, L-band). The ground truth of the dataset is annotated with five classes namely, City (red), Field (yellow), Forest (dark green), Forest (dark green), Grassland (light green), Streets (blue).

The false color image and the ground truth images look like the ones shown here:

False Color Image, Oberpfaffenhofen dataset. Image from opensource European Space Agency Sample Datasets
False Color Image, Oberpfaffenhofen dataset. Image from opensource European Space Agency Sample Datasets
Annotated Ground Truth with 5 labels - Field (yellow), Forest (dark green), Grassland (light green), Streets (blue), City (red) and Black (unclassified). Image from opensource European Space Agency Sample Datasets
Annotated Ground Truth with 5 labels – Field (yellow), Forest (dark green), Grassland (light green), Streets (blue), City (red) and Black (unclassified). Image from opensource European Space Agency Sample Datasets

Skipping the PolSAR jargon as it is beyond the scope of this article, coherency matrix was used as the input feature vector of nine dimensions. The hidden layer encoded the input data and stored it in it’s 5 neurons. In the process it ended up learning some important features from the input vector. The encoded data was decoded and reconstructed at the output, as is expected from an autoencoder. The result was classified using k-NN algorithm.

I trained the model using stochastic gradient descent (SGD) for 100 epochs with a learning rate of 0.1 and momentum of 0.9 and used mean square error (MSE) as cost function.

Each neuron in the hidden layer encodes and learns some features from the input which can be plotted as a feature map.

Feature Map - Neuron 1. Image by author
Feature Map – Neuron 1. Image by author
Feature Map - Neuron 2. Image by author
Feature Map – Neuron 2. Image by author
Feature Map - Neuron 3. Image by author
Feature Map – Neuron 3. Image by author
Feature Map - Neuron 4. Image by author
Feature Map – Neuron 4. Image by author
Feature Map - Neuron 5. Image by author
Feature Map – Neuron 5. Image by author

After reconstruction and classification at the output layer, the result obtained looked something like this:

Reconstructed features from Autoencoder using k-NN classifier. Image by author
Reconstructed features from Autoencoder using k-NN classifier. Image by author

The vanilla autoencoder implementation achieved an overall accuracy (OA) of 70%. This might not seem very impressive as the model used was very simple but can certainly be improved with a more complex network architecture. Some parallel work has shown that multi-layer autoencoder networks perform better than their vanilla counterparts. Also, it is not enough to just comment on the OA of the whole image but individual accuracy of each class should also be commented on. As is evident from the classification result, field (yellow) and forest (dark green) have the most accurate classifications while grassland (light green), streets (blue) and city (red) have varied classification errors. The possible reason for this is besides the nature of the PolSAR data captured, the fact that the forest and field have highest number of samples to train on, also leads to reduction in the error. In fact, the class wise accuracy results are comparable even when different conventional techniques are applied for feature extraction. On reducing the dimensionality of data using t-sne, the result evidently shows how fields, forest and grassland are classified better than city and streets.

Dimensionality reduction by t-sne. Image by author
Dimensionality reduction by t-sne. Image by author

One might also ask, why use autoencoder when you can use CNN and achieve better results? Well, it made sense to use a CNN if the data was much bigger than what was used for training purpose. For such a small dataset, CNN seemed a little overkill and might have led to overfitting. Autoencoder provided a better solution and it is almost certain that with a slightly more complex network, the model would extract features more accurately than it currently does (this also remains as a future enhancement!).

Autoencoders are not as widely used in real-world application and when used only finds applications in data denoising, dimensionality reduction and variational autoencoders. However, they are very simple and can be used efficiently for feature extraction and the fact that they are unsupervised makes them an attractive choice for applications that do not have high quality labels available, like in remote sensing. It’s time they got their due.

Until then, feel free to reach out to me if something wasn’t clear enough or needs improvement. And, happy coding!!


Related Articles