Classification on Hyperspectral Data

A step-by-step tutorial about learning how to perform feature reduction and then the classification of hyperspectral data using Support Vector Machines

Richa Dutt

Published in

Towards Data Science

4 min readDec 16, 2021

Introduction

The goal of this tutorial is to apply PCA to hyperspectral data. (To learn about PCA, read the article “PCA on Hyperspectral Data”.). After reducing the dimensionality of the data using PCA, classify the data by applying the Support Vector Machine(SVM) to classify the different materials in the image.

Steps

We are using the Hyperspectral Gulfport Dataset in this tutorial. You can download the data from the following link.

GitHub - GatorSense/MUUFLGulfport: MUUFL Gulfport Hyperspectral and LIDAR Data: This data set…

MUUFL Gulfport Hyperspectral and LIDAR Data: This data set includes HSI and LIDAR data, Scoring Code, Photographs of…

github.com

The MUUFL Gulfport data contains the pixel-based ground truth map which was provided by manually labeling the pixels in the scene. The following classes were labeled in the scene trees, mostly grass, ground surface, mixed ground surface, dirt and sand, road, water, buildings, the shadow of buildings, sidewalk, yellow curb, cloth panels (targets), and unlabeled points.

Step 1: Importing the libraries

Step 2: Loading the data

There are 65 bands in the original data.

Step 3: Get rid of unlabeled and some classes data

Remove the unlabeled data points and also some similar classes are merged into one. For example, water and building shadows are merged into one class as they have similar spectra. Also, cloth panel and yellow curb classes are ignored since they are very few in numbers thus, not enough available for training.

Since the ground truth labels start from one. so the in the last line I subtract one from all labels to make sure they start from zero.

Step 4: Split the data into training and testing

Step 5: Applying SVM classifier on the Original Data

I have applied K-fold cross-validation on the training data.

Step 6: Apply SVM classifier on the PCA data

I have applied an ensemble of 3 models and used K-fold cross-validation on the training data.

Final prediction from all three models after computing the majority vote.

Step 7: Plot the final image after PCA

Conclusion

The dimensionality of data before PCA is 65 and after PCA it is 3. PCA reduced the dimensionality of data by a factor of almost 21.

We can conclude from the above results that when we apply SVM to the original data, the accuracy is around 88.7% and after applying SVM to the PCA is 88.9%. So we are getting the nearly same accuracy in both cases.

That's why we apply classifiers on the reduced data. It reduces the time and space complexity. Depending on the problem, the accuracy with PCA might be even higher compared to using the original data.

Thanks for reading! I hope you found this article useful. Feel free to ask, if you have any questions.

References

GitHub - GatorSense/MUUFLGulfport: MUUFL Gulfport Hyperspectral and LIDAR Data: This data set…

MUUFL Gulfport Hyperspectral and LIDAR Data: This data set includes HSI and LIDAR data, Scoring Code, Photographs of…

github.com

sklearn.svm.SVC

C-Support Vector Classification. The implementation is based on libsvm. The fit time scales at least quadratically with…

scikit-learn.org

A Gentle Introduction to k-fold Cross-Validation - Machine Learning Mastery

Cross-validation is a statistical method used to estimate the skill of machine learning models. It is commonly used in…

machinelearningmastery.com

sklearn.decomposition.PCA

Principal component analysis (PCA). Linear dimensionality reduction using Singular Value Decomposition of the data to…

scikit-learn.org

How to Use StandardScaler and MinMaxScaler Transforms in Python - Machine Learning Mastery

Many machine learning algorithms perform better when numerical input variables are scaled to a standard range. This…

machinelearningmastery.com

Classification on Hyperspectral Data

A step-by-step tutorial about learning how to perform feature reduction and then the classification of hyperspectral data using Support Vector Machines

Introduction

Steps

GitHub - GatorSense/MUUFLGulfport: MUUFL Gulfport Hyperspectral and LIDAR Data: This data set…

MUUFL Gulfport Hyperspectral and LIDAR Data: This data set includes HSI and LIDAR data, Scoring Code, Photographs of…

Step 1: Importing the libraries

Step 2: Loading the data

Step 3: Get rid of unlabeled and some classes data

Step 4: Split the data into training and testing

Step 5: Applying SVM classifier on the Original Data

Step 6: Apply SVM classifier on the PCA data

Step 7: Plot the final image after PCA

Conclusion

References

GitHub - GatorSense/MUUFLGulfport: MUUFL Gulfport Hyperspectral and LIDAR Data: This data set…

MUUFL Gulfport Hyperspectral and LIDAR Data: This data set includes HSI and LIDAR data, Scoring Code, Photographs of…

sklearn.svm.SVC

C-Support Vector Classification. The implementation is based on libsvm. The fit time scales at least quadratically with…

A Gentle Introduction to k-fold Cross-Validation - Machine Learning Mastery

Cross-validation is a statistical method used to estimate the skill of machine learning models. It is commonly used in…

sklearn.decomposition.PCA

Principal component analysis (PCA). Linear dimensionality reduction using Singular Value Decomposition of the data to…

How to Use StandardScaler and MinMaxScaler Transforms in Python - Machine Learning Mastery

Many machine learning algorithms perform better when numerical input variables are scaled to a standard range. This…

Written by Richa Dutt