Identifying the physical aspect of the earth’s surface (Land cover) as well as how we exploit the land (Land use) is a challenging problem in environment monitoring and many other subdomains. This can be done through field surveys or analyzing satellite images(Remote Sensing). While carrying out field surveys is more comprehensive and authoritative, it is an expensive project and mostly takes a long time to update.
With recent developments in the Space industry and the increased availability of satellite images (both free and commercial), Deep Learning and Convolutional Neural Networks has shown a promising result in land use classification.
In this project, I used the freely available Sentinel-2 satellite images to classify 9 land use classes and 24000 labeled images ( Figure 2). The original dataset contains 10 classes and 27000 labeled images and is available here.

Here are some images for visualization of the different Land use classes.

Sentinel-2 data is multispectral with 13 bands in the visible, near infrared and shortwave infrared spectrum. These bands come in different spatial resolution ranging from 10 m to 60 m, thus images can be categorized as high-medium resolution. While there are other higher resolution satellites available(1m to 0.5 cm), Sentinel-2 data is free and has a high revisit time (5 days) which makes it an excellent option to monitor land use.

Data Preprocessing
Although some Deep learning architectures can take all 13 bands as input, it was necessary to preprocess data. The images were in TIFF format and some of the architectures I tried could not accommodate it. I opted to use GDAL and Rasterio, being my favorite choice of tool and familiarity with them, to transform them into JPG format and select bands. gdal_translate did the trick.
gdal_translate -of GTiff -b 1 -b 10 -b 13 input_sentinel_13_band.tif output_RGB.tif
These are some of the band combination I have tried to experiment:
- All 13 bands
- Red, Geen, Blue (RGB) Bands
- Shortwave Infrared(SWIR)
- High-resolution Bands (Bands with 10–20 m)
- Special Band Combinations – here domain knowledge in Remote Sensing helps a lot. Some band combinations can elicit Agriculture, vegetation, water or land.
- Data Augmentation with different combinations (i,e. RGB with Special bands)
Modeling
I used Transfer learning (Resnet50) with Fastai library to train my model. Thanks to amazing deep learning courses by the Fastai team, the techniques used here are from the Deep learning course materials. The procedure I followed training the model was:
- Training the last layer
- Try data augmentation
- Freeze all layers and retrain from scratch
Techniques used in modelling are among others: Learning rate finder, Stochastic Gradient descent with restarts, and Annealing.
Results
As mentioned in the preprocessing section, I have experimented with different band combinations. The highest accuracy of my model is 0.94 and while this is less than the accuracy reported in the original paper with the dataset (0.98), it is relatively high for my project and its objectives. The result of all my experiment is in this table below (Reflections on key takeaways section):
+------------------------+-----------------------------+
| Bands | Result(Accuracy) |
+------------------------+-----------------------------+
| All Bands | 0.83 |
| RGB | 0.84 |
| High Resolution Bands | 0.81 |
| Special Bands | 0.94 |
| RGB+Special Bands | 0.80 |
+------------------------+-----------------------------+
Some of the classes that the model found out to be challenging to distinguish are Forest and SeaLake as shown in the Confusion matrix ( Figure 3). with a close look at the images of these two classes, one can infer that even the human eye is difficult to clearly differentiate.

Key Takeaways:
Domain knowledge in band combinations helps improve this particular model. All the literature I have seen in Deep learning applications with Land use / Land cover classification use the same bands for all of their class inputs(i,e. RGB or SWIR). My method allowed me to increase almost an accuracy of 10%.
While I have assumed that more bands would definitely improve my model, I found out to be not the case. Using all 13 bands did not perform well. This can be attributed to the inclusion of low-resolution bands. But again, using only High-resolution bands has one of the lowest accuracy (0.81).
Another experiment was to increase the dataset by adding together RGB images and the Special band combinations into the same folder thus doubling the number of images available for training. This has the lowest accuracy (0.80). Notebook is available here.