Making an Autoencoder

Using Keras and training on MNIST

Arvin Singh Kushwaha

Published in

Towards Data Science

7 min readJul 2, 2019

Understanding Autoencoders:

Image of Autoencoder Architecture created by Arvin Singh Kushwaha

Autoencoders are a class of Unsupervised Networks that consist of two major networks: Encoders and Decoders.

An Unsupervised Network is a network that learns patterns from data without any training labels. The network finds its patterns in the data without being told what the patterns should be.

In contrast, there are Supervised Networks wherein the network is trained to return specific outputs when given specific inputs.

The Encoder generally uses a series of Dense and/or Convolutional layers to encode an image into a fixed length vector that represents the image a compact form, while the Decoder uses Dense and/or Convolutional layers to convert the latent representation vector back into that same image or another modified image.

The image above shows an example of a simple autoencoder. In this autoencoder, you can see that the input of size X is compressed into a latent vector of size Z and then decompressed into the same image of size X.

To generate an image, a random input vector is given to the Decoder network. The Decoder network will convert the input vector into a full image.

Creating the Autoencoder:

I recommend using Google Colab to run and train the Autoencoder model.

Installing Tensorflow 2.0

#If you have a GPU that supports CUDA
$ pip3 install tensorflow-gpu==2.0.0b1#Otherwise
$ pip3 install tensorflow==2.0.0b1

Tensorflow 2.0 has Keras built-in as its high-level API. Keras is accessible through this import:

import tensorflow.keras as keras

Importing Necessary Modules/Packages

from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Dense, Input, Flatten,\
                                    Reshape, LeakyReLU as LR,\
                                    Activation, Dropout
from tensorflow.keras.models import Model, Sequential
from matplotlib import pyplot as plt
from IPython import display # If using IPython, Colab or Jupyter
import numpy as np

Loading MNIST Data

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train/255.0
x_test = x_test/255.0

The MNIST dataset is comprised of 70000 28 pixels by 28 pixels images of handwritten digits and 70000 vectors containing information on which digit each one is.

The image training data is scaled from [0, 255] to [0,1] to allow for use of the sigmoid activation function.

To check our data, we’ll plot the first image in the training dataset.

# Plot image data from x_train
plt.imshow(x_train[0], cmap = "gray")
plt.show()

Deciding the Latent Size

Latent size is the size of the latent space: the vector holding the information after compression. This value is a crucial hyperparameter. If this value is too small, there won’t be enough data for reconstruction and if the value is too large, overfitting can occur.

I found that a nice, successful latent size was 32 values long.

LATENT_SIZE = 32

Creating the Encoder

encoder = Sequential([
    Flatten(input_shape = (28, 28)),
    Dense(512),
    LR(),
    Dropout(0.5),
    Dense(256),
    LR(),
    Dropout(0.5),
    Dense(128),
    LR(),
    Dropout(0.5),
    Dense(64),
    LR(),
    Dropout(0.5),
    Dense(LATENT_SIZE),
    LR()
])

The encoder consists of a series of Dense layers with interstitial Dropout and LeakyReLU layers. The Dense Layers allow for the compression of the 28x28 input tensor down to the latent vector of size 32. The Dropout layers help prevent overfitting and LeakyReLU, being the activation layer, introduces non-linearity into the mix. Dense(LATENT_SIZE) creates the final vector of size 32.

Creating the Decoder

decoder = Sequential([
    Dense(64, input_shape = (LATENT_SIZE,)),
    LR(),
    Dropout(0.5),
    Dense(128),
    LR(),
    Dropout(0.5),
    Dense(256),
    LR(),
    Dropout(0.5),
    Dense(512),
    LR(),
    Dropout(0.5),
    Dense(784),
    Activation("sigmoid"),
    Reshape((28, 28))
])

The decoder is essentially the same as the encoder but in reverse. The final activation layer is sigmoid, however. The sigmoid activation function output values in the range [0, 1] which fits perfectly with our scaled image data.

Creating the Full Model

To create the full model, the Keras Functional API must be used. The Functional API allows us to string together multiple models.

img = Input(shape = (28, 28))

This will create a placeholder tensor which we can feed into each network to get the output of the whole model.

latent_vector = encoder(img)
output = decoder(latent_vector)

The best part about the Keras Functional API is how readable it is. The Keras Functional API allows you to call models directly onto tensors and get the output from that tensor. By calling the encoder model onto the img tensor, I get the latent_vector. The same can be done with the decoder model onto the latent_vector which gives us the output.

model = Model(inputs = img, outputs = output)
model.compile("nadam", loss = "binary_crossentropy")

To create the model itself, you use the Model class and define what the inputs and outputs of the model are.

To train a model, you must compile it. To compile a model, you have to choose an optimizer and a loss function. For the optimizer, I chose Nadam, which is Nesterov Accelerated Gradient applied to Adaptive Moment Estimation. It is a modified Adam optimizer. For the loss, I chose binary cross-entropy. Binary Cross-Entropy is very commonly used with Autoencoders. Usually, however, binary cross-entropy is used with Binary Classifiers. Additionally, binary cross-entropy can only be used between output values in the range [0, 1].

Training the Model

EPOCHS = 60

The value EPOCHS is a hyperparameter set to 60. Generally, the more epochs the better, at least until the model plateaus out.

#Only do plotting if you have IPython, Jupyter, or using Colab

Repeatedly plotting is really only recommended if you are using IPython, Jupyter, or Colab so that the matplotlib plots are inline and not repeatedly creating individual plots.

for epoch in range(EPOCHS):
    fig, axs = plt.subplots(4, 4)
    rand = x_test[np.random.randint(0, 10000, 16)].reshape((4, 4, 1, 28, 28))
    
    display.clear_output() # If you imported display from IPython
    
    for i in range(4):
        for j in range(4):
            axs[i, j].imshow(model.predict(rand[i, j])[0], cmap = "gray")
            axs[i, j].axis("off")
    
    plt.subplots_adjust(wspace = 0, hspace = 0)
    plt.show()
    print("-----------", "EPOCH", epoch, "-----------")
    model.fit(x_train, x_train)

First, we create plots with 4 rows and 4 columns of subplots and choose 16 random testing data images to check how well the network performs.

Next, we clear the screen (only works on IPython, Jupyter, and Colab) and plot the predictions by the model on the random testing images.

Finally, we train the model. To train the model we simply call model.fit on the training image data. Remember how the autoencoder’s goal is to take the input data, compress it, decompress it, and then output a copy of the input data? Well, that means that the input and the target output are both the training image data.

As you can see, these generated images are pretty good. The biggest problem with the images, however, is with the blurriness. Many of these problems can be fixed with other types of Generative Networks or even other types of Autoencoders.

Full Code

from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Dense, Input, Flatten,\
                                    Reshape, LeakyReLU as LR,\
                                    Activation, Dropout
from tensorflow.keras.models import Model, Sequential
from matplotlib import pyplot as plt
from IPython import display # If using IPython, Colab or Jupyter
import numpy as np(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train/255.0
x_test = x_test/255.0# Plot image data from x_train
plt.imshow(x_train[0], cmap = "gray")
plt.show()LATENT_SIZE = 32encoder = Sequential([
    Flatten(input_shape = (28, 28)),
    Dense(512),
    LR(),
    Dropout(0.5),
    Dense(256),
    LR(),
    Dropout(0.5),
    Dense(128),
    LR(),
    Dropout(0.5),
    Dense(64),
    LR(),
    Dropout(0.5),
    Dense(LATENT_SIZE),
    LR()
])decoder = Sequential([
    Dense(64, input_shape = (LATENT_SIZE,)),
    LR(),
    Dropout(0.5),
    Dense(128),
    LR(),
    Dropout(0.5),
    Dense(256),
    LR(),
    Dropout(0.5),
    Dense(512),
    LR(),
    Dropout(0.5),
    Dense(784),
    Activation("sigmoid"),
    Reshape((28, 28))
])img = Input(shape = (28, 28))
latent_vector = encoder(img)
output = decoder(latent_vector)model = Model(inputs = img, outputs = output)
model.compile("nadam", loss = "binary_crossentropy")EPOCHS = 60#Only do plotting if you have IPython, Jupyter, or using Colabfor epoch in range(EPOCHS):
    fig, axs = plt.subplots(4, 4)
    rand = x_test[np.random.randint(0, 10000, 16)].reshape((4, 4, 1, 28, 28))
    
    display.clear_output() # If you imported display from IPython
    
    for i in range(4):
        for j in range(4):
            axs[i, j].imshow(model.predict(rand[i, j])[0], cmap = "gray")
            axs[i, j].axis("off")
    
    plt.subplots_adjust(wspace = 0, hspace = 0)
    plt.show()
    print("-----------", "EPOCH", epoch, "-----------")
    model.fit(x_train, x_train)

A Google Colab for this Code can be found here.

After training for 60 epochs, I got this image:

As you can see, the results are pretty good. The autoencoder successfully encodes and decodes the latent space vectors with pretty good quality. This autoencoder is the “vanilla” variety, but other types like Variational Autoencoders have even better quality images. Also, by increasing the number of epochs, results can be improved further.

Uses for Autoencoders

Autoencoders, put simply, learn how to compress and decompress data efficiently without supervision. This means Autoencoders can be used for dimensionality reduction. The Decoder sections of an Autoencoder can also be used to generate images from a noise vector.

Practical applications of an Autoencoder network include:

Denoising
Image Reconstruction
Image Generation
Data Compression & Decompression