Nothing tells of the ubiquity of satellite imagery like Google Maps. A completely unpaid service provides anyone with internet access a entire planet’s worth of satellite imagery. While Google Maps is free, other paid alternatives exist which take photos of the earth’s surface on a more frequent basis for commercial use. World governments also utilize their satellites for many domestic uses.
As the availability of satellite imagery outpaces the ability of humans to look through them manually, an automated means to classify them must be developed. Classification of images is a fundamental problem in computer vision and Neural Networks provide an interesting solution.
Classifying Ships in a Bay
Available on Kaggle, the Ships in Satellite Imagery dataset contains 4000 annotated images of ships and non-ships. Taken from the Planet API, the 1000 ship images are uniformly 80px x 80px and contain a ship in various orientations, but always near centered.

Of the 3000 non-ship images, about 1000 depict random features, such as the ocean or a building, about a 1000 depict a partial, but not complete, image of a ship, and the last 1000 depict images which have been mislabeled by other machine learning models.

The accompanying JSON file includes an image’s ID and a notation of whether or not the image contains a ship, as designated by a 1 or 0, accordingly. In addition, it contains the actual pixel values for each image.
Arranged as 19,200 integer values, the first third contain all pixel values in the red channel, followed by pixel values in the green channel, and the final third in the blue channel. Since the pixel values are explicitly stated in the dataset, the actual images themselves don’t technically need to be downloaded, but they are a nice reference.
To classify these images, a convolutional neural network (CNN) will be trained. As a type of artificial neural network, CNN’s mimic the neurons in the brain, particularly those used for vision. Each neuron in the network develops a unique feature map which can identify a feature in an image. In shallow networks, a feature map might recognize a vertical or horizonal line. As layers are added to the network, however, feature maps may recognize more complex structures, such as an eye or, in this case, a ship.
Various libraries exist to write CNN’s, but this tutorial will cover TensorFlow with Keras.
Convolution Neural Networks in Python
import tensorflow as tf
import pandas as pd
import numpy as np
Before training anything, basic imports are used. TensorFlow is a machine learning library that places a large focus on neural networks. Pandas is a spreadsheet-type library to help parse data. Finally, NumPy helps crunch numbers quickly and efficiently.
# Read the data
df = pd.read_json("shipsnet.json")
This line simply imports the JSON file and reads it as a data frame.
# Normalize and reshape the image data
df["normalized_data"] = df["data"].apply(lambda x: (np.array(x) / 255).reshape(80, 80, 3))
The pixel values are stored in a column in the data frame titled "data." As is, these pixel values aren’t ready to be processed by a CNN. Instead, the new data is converted to a NumPy array and divided by 255 to normalize the values. All 19,200 values should now be some value between 0 and 1. Next the data is reshaped to 80 x 80 x 3 matrix so that it’s formatted as a picture.
# Define X and Y
X = df["normalized_data"]
Y = df["labels"]
# Split the data into training and testing sets. Use a 75/25 split
from sklearn.model_selection import train_test_split
(X_train, X_test, Y_train, Y_test) = train_test_split(X, Y, test_size=0.25, random_state=42)
The X and Y values are defined. Y is predictably the column titled "labels" and contains the array of 1’s and 0’s to define whether an image contains a ship. The X is the normalized image data abstracted from the pixel values.
With X and Y defined, they are split into training and testing sets along a 75/25 split. As a result, the model will train on 3000 images and validate its results on 1000 other images.
# Transform the training and testing data into arrays
X_train = np.array([x for x in X_train])
X_test = np.array([x for x in X_test])
Y_train = np.array([y for y in Y_train])
Y_test = np.array([y for y in Y_test])
Unfortunately, Pandas Series aren’t accepted in TensorFlow, so the training and testing data are converted into arrays.
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.layers import Activation
# Starts the model with a sequential ANN
model = models.Sequential()
After a couple of imports, the actual CNN is ready to be built. The CNN is initialized as a sequential model%2C%20layers.), which ensures each layer receives one input and one output.
# Adds the first convulsion layer and follows up with max pooling
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(80, 80, 3)))
model.add(layers.MaxPooling2D((2, 2)))
The first two layers are added to the model. The first layer is a convolution layer which uses a "ReLu" activation function and expects an input tensor of 80 x 80 x 3, the exact dimensions of the training images. The 32 represents the dimensionality of the output of the layer and the (3, 3) represents the size of the convolution window, 3px x 3px in this case.
The next layer added is for max pooling, which takes a pool size of 2 x 2.
# Add additional hidden layers
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
Much like the previous bit of code, these two lines additional hidden layers to the network. While output size has changed, these layers follow the same basic pattern as the first two.
# Flattens the input into a 1D tensor
model.add(layers.Flatten())
# Makes the input more readable for classification
model.add(layers.Dense(64, activation='relu'))
# Classifies - ensure the input in the number of classes, indexed
# at 0
model.add(layers.Dense(1))
# Final activation function
model.add(Activation('sigmoid'))
The first line simply flattens the tensor into a 1 dimension, which will make processing easier. The next line, the first dense layer, formats the resulting input.
The next dense layer concerns itself with classification. As a result, the only argument passed is the number of classes, indexed at 0. Since there are two classes in this example, either ship or not-ship, 1 is passed.
Finally an activation layer is added which tells the whether or not to fire the neuron.
model.summary()
Before moving forward, take the time to review the model using the summary method. The output should look like this:

Note how each line describes a layer built into the CNN, as well as its output shape, and number of parameters.
# Compile the model
# Use binary_crossentropy because there are only 2 classes present
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
This line simply compiles the model. If there were issues with input/output dimensionality while adding layers, the program will let you know at this step.
The loss function is taken as binary crossentropy, since the model uses only two classes. Using more classes would require something different. A full list of loss functions may be reviewed [here](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers). The optimizer is taken as the RMS prop algorithm, but others are available here. The metrics argument simply looks at what needs to be optimized, which in this case is simply accuracy.
# Train the model
gen_model = model.fit(X_train, Y_train, epochs=10, validation_data=(X_test, Y_test))
It’s finally time to fit the model. Passing the training and testing data is straightforward. The epochs argument essentially tells the model how many iterations to go through.
There’s a diminishing margin of return for setting epochs. A higher number will generally return a better accuracy, but each additional gain in accuracy will decrease until it approaches the maximum amount of accuracy the dataset can produce. Additionally, more epochs will take longer to run. For this dataset, 10 will return good results.
When the model trains, output will be given that looks like the below for each epoch:

After returning the number of samples and giving the time to train the epoch, the line will also return the loss and accuracy of the model on its own training set of images as well as its loss and accuracy for the validation set.
In this case, on the 10th epoch, the model achieved 99.27% accuracy on its own training images and 98.5% accuracy on images the model had never seen before.
Evaluation
Before celebrating too much, a deeper analysis of the results should be taken.
# Evaluate the model
from sklearn.metrics import classification_report, confusion_matrix
predictions = model.predict(X_test)
print(classification_report(Y_test, predictions.round()))
print(confusion_matrix(Y_test, predictions.round()))
These lines import a few functions to help evaluate the accuracy of the model, apply the model to the testing data, and print a classification report and a confusion matrix.
The classification report:

The testing data was composed of 1000 images, 733 of which were non-ships and 267 were ships. A precision of 99% for the non-ships is slightly higher than the 97% for the ships. Essentially, for all the images the model classified as ships, 97% were actual ships. The model was able to recall 99% of the non-ship images and 98% of the non-ship images, respectfully.
Overall, these are great results for a simple CNN.
Now, looking at the confusion matrix:

Of the 733 non-ship images, 725 were correctly identified and 8 were mislabeled as a ship. These are the false positives.
Of the 267 ship images, 262 were correctly identified and 5 were mislabeled as a non-ship. These are the false negatives.
Saving and Loading the Model
A CNN probably wouldn’t be very useful if needed to be trained every time it was needed. In this case, the training time only took a few minutes, but on deeper networks with more epochs, training can take hours or even days. Consequently, there’s a simple method to call to save the whole model.
# Save the model for later use
model.save("ShipCNN.h5")
The save method simply takes the name of the path to save it as a H5 file.
# Load a model
new_model = tf.keras.models.load_model("model.h5")
Loading a pre-saved file is also fairly simple. It’s also a good a good idea to call the summary method to check that the model’s architecture matches to expectation.
Conclusions
Using satellite imagery to train a CNN provided the perfect dataset. All images were of the same size, taken at essentially the same angle and distance, and every ship retained a top-down view. While changing any of these parameters would make the problem of classification more difficult, the ship images demonstrated the power of neural networks applied to computer vision problems.
The CNN used was relatively simple, but still returned a high accuracy for only a few minutes of training. Approaching a human-level of performance, only 13 of 1000 images were incorrectly labeled. While the application was idealized, the potential is plainly obvious.