Hunting Naval Mines with Deep Learning

A model to help improve naval mines detection using an autoencoder for feature selection and a deep neural network for binary classification

Augusto de Nevrezé

Published in

Towards Data Science

7 min readJun 15, 2021

Robotic Submarines for naval mine defense | Image by US Naval Institute

A Brief Introduction to The Problem

There’s no doubt that mines represent an important issue to the navigability of the oceans. There are many of them laying around since several decades ago. The cost of producing and laying a mine is usually between 0.5% and 10% of the cost of removing it, and it can take up to 200 times as long to clear a minefield as to lay it. There still exist some naval minefields dating back to the World War II, and will remain dangerous for many years, since they are too extensive and expensive to clear.

In the following paragraphs several deep learning implementations will be discussed. In order to keep a friendly description of the project, it is advisable to read the article alongside the notebooks provided in the following repo. There are 2 notebooks provided, in the first one, an autoencoder is trained. In the second, the encoder section of the autoencoder is used to train a neural network binary classifier. The notebooks are also available in Kaggle.

The dataset of this article contains patterns obtained by bouncing sonar signals off a metal cylinder and rocks at various angles and under various conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. Each pattern is a set of 60 frequency bins scaled to a range from 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time.

A power spectral density example for each category | Image by author

The label associated with each record contains the letter “R” if the object is a rock and “M” if it is a mine (metal cylinder). The numbers in the labels are in increasing order of aspect angle, but they do not encode the angle directly. The dataset is almost balanced, between categories as it can be seen in the following picture.

Categories Distribution in the Dataset | Image by author

Devising a Solution

Everybody agrees that dimensionality in data is a curse, having many different dimensions tends to algorithms and models which under-performs. With Deep Neural Networks this is not the exception, the article will try to evidence this, there is more information in the notebooks provided. Even if the dataset has only 60 dimensions, the problem is present. Bear in mind that the issue can be orders of magnitude bigger with images, for example.

Since we have many dimensions, we can train an autoencoder network, an unsupervised deep learning technique, to reduce co-linearity between variables and hopefully use a smaller dimension classifier which can perform better. Alternatively a traditional PCA could have been used, but in general terms autoencoders tend to perform better.

Correlation between variables | Image by author

As it can be seen in the previous plot, there is some correlation between variables, this information can be compressed with the method proposed.

Autoencoder Structure

An autoencoder consists of two major parts, the encoder and the decoder networks. The encoder network is used during both training and deployment, while the decoder network is only used during training. The purpose of the encoder network is to discover a compressed representation of the given input. In this project, a 10-dimensional representation is generated from a 60-dimensional input. This occurs right in the middle of the autoencoder, which is also known as the bottleneck. The purpose of the decoder network, which is just a reflection of the encoder network, is to reconstruct the original input as closely as possible.

The network has also two optimizations. A batch normalization has been used in order to stabilize the learning process and reduce significantly the number of epochs required to achieve the minimum loss. Also, a Leaky ReLU activation has been chosen instead of a regular ReLU. The first one tends to improve the network performance, achieving smaller loss values.

Autoencoder training and testing loss | Image by author

As a baseline comparison, a Logistic Regression model was chosen. It’s been trained with the 60 inputs in one of the cases and with the compressed version of 10 inputs in the other. The results have been later verified with a test dataset. The model with less dimensions, slightly outperforms the other. This indicates that there is a real benefit in reducing the amount of variables.

Deep Learning Classifier

In order to create a reduced-dimension classifier, only the first part of the autoencoder is used after training, particularly the encoder section. This set of layers are connected to the input of the classifier. A set of fully connected layers follows the encoder section until the output is reached, this last layer consists of only one neuron. The output of network described is slightly different from the one presented in the original paper, since the authors defined two outputs: one per class.

Encoder connected as the input of the binary classifier | Image by author

To train this network and avoid overfitting, the EarlyStopping callback has been used. In this way the training is stopped when a monitored metric is no further improving. The test loss has been used as the monitored parameter for the EarlyStopping. The results can be seen below, plots in blue belongs to the loss and accuracy on the training dataset. The results in the test dataset are represented by the yellow curves.

Loss and accuracy for training and testing sets for the compressed neural net | Image by author

The results can be compared with the ones obtained after training a network with the non compressed inputs. The network model proposed in this case is:

inputs --> 30 --> 20 --> 10 --> 5 --> 3 --> output

Each number represents the amount of neurons per layer. In the picture below, it can be observed that the accuracy obtained in the test dataset is no more than 80%, before the model starts to overfit.

Loss and accuracy for not compressed neural network | Image by author

Finally, the confusion matrix obtained for the reduced model is presented. Most of the cases were correctly classified, what indicates an almost optimal model generalization capability. Since it tends to favor the detection of mines over rocks, this false positives generate a safer bias.

So What?

After analyzing the results it can be observed that the feature extraction performed with the autoencoder, helped the classifier network to achieve a higher accuracy value. Results are in line with the ones obtained in the original paper from Gorman and Sejnowski with huge deep networks. However, to obtain the same result, they trained a model with about 12 hidden layers!. In the same direction, the results obtained for a network with 3 hidden layers and no encoding, are similar to the previous mentioned paper.

Some other efforts have been done in order to increase the accuracy reducing the number of neurons in the hidden layers (pruning) without success. There is another publication that focuses in the comparison of several classic Machine Learning (ML) optimized models, achieving similar results as this one. This result is obtained after applying feature selection techniques with WEKA software. However it fails to deliver a good performance with neural networks.

Finally, there’s an excellent article analyzing different neural network architectures. There is also a comparison with classical ML models in it. The author proposes a feature removal technique, after analyzing the contributors to a Random Forest model by simply deleting those variables which contribute less to it.

Final Words

Autoencoders show a good performance for feature selection processes. Even when we’re handling structured data as in this case. The main objective of the project was to analyze the use case for automatic feature selection before training models, in particular deep neural networks. Among other uses, it can be mentioned the data compression, which is currently used broadly in mobile phones. Another use case for autoencoders is picture denoising. There’s an excellent entry in the Keras official blog with autoencoder applications. It’s been written by François Chollet, creator of Keras and author of a great deep learning book.

Besides all the benefits of autoencoders, since they are data-specific, that turns them impractical for real-world data compression problems: you can only use them on data that is similar to what they were trained on. In case somebody pretends to used them in general applications, it will requires lots of training data.

Thanks for reaching up to here. You can get the notebooks used in this article from this repo. I’m always free to talk about Data Science, don’t hesitate to reach me at twitter.