Deep Autoencoders using Tensorflow

Tathagat Dasgupta
Towards Data Science
4 min readJul 31, 2018

--

In this tutorial, we will be exploring an unsupervised learning neural net called Autoencoders.

So, autoencoders are deep neural networks used to reproduce the input at the output layer i.e. the number of neurons in the output layer is exactly the same as the number of neurons in the input layer. Consider the image below

This image represents the structure of a typical deep autoencoder. The goal of an autoencoder architecture is to create a representation of the input at the output layer such that both are as close (similar) as possible. But, the actual use of autoencoders is for determining a compressed version of the input data with the lowest amount of loss in data. What I mean by this is: You must have heard of a term while developing machine learning projects called Principle Component Analysis. The concept of PCA is to find the best and relevant parameters for training of a model where the dataset has a huge number of parameters.

The autoencoders work in a similar way. The encoder part of the architecture breaks down the input data to a compressed version ensuring that important data is not lost but the overall size of the data is reduced significantly. This concept is called Dimensionality Reduction.

The downside to this concept is, the compressed data is a blackbox i.e we cannot determine the structure of data in its compressed version. Keep in mind that, suppose we have a dataset with 5 parameters and we train a autoencoder over this data. The encoder does not omit some of the parameters for better representation but it fuses the parameters together to create a compressed version but with fewer parameters(brings the number of parameters down to 3 from 5).

So, an autoencoder has two halves namely, encoder and decoder.

The encoder compresses the input data and the decoder does the reverse to produce the uncompressed version of the data to create a reconstruction of the input as accurately as possible.

We will be using the Tensorflow to create a autoencoder neural net and test it on the mnist dataset. So, lets get started!!

Firstly, we import the relevant libraries and read in the mnist dataset. If the dataset is present on your local machine, well and good, otherwise it will be downloaded automatically by running the following command

Next, we create some constants for our convenience and also declare our activation function beforehand. The images in the mnist dataset are 28x28 pixels in size i.e. 784 pixels and we will be compressing it to 196 pixels. You can always go deeper and reduce the pixel size even further. But, compressing it too much may cause the autoencoder to loose information.

Now, we create variables for the weights and biases for each layer. And, then we create the layers as well using the previously declared activation function

The tf.variance_scaling_initializer() is not typically used. But, we use it here because we are dealing with changing input sizes. So, the placeholder tensor shape(the placeholder is for the input batch) adjusts itself according to the shape of the input size, which stops us from running into any dimension errors. The hidden layers are created by simply feeding the previous hidden layer as input with the relevant weights and biases into the activation function(ReLu).

We will use the MSE loss function for this neural net and pass it through a Adam optimizer. You can always play around with these for interesting results.

Now, we define the number of epochs and the batch size and run the session. We use utility functions from mnist to get each new batch: mnist.train.next_batch() . Also we will output the training loss after every epoch to monitor its training.

Finally, we will script a small plotting function to plot the original images and the reconstructions to see how well our model works.

Final output

Here, we can see the reconstructions are not perfect but are pretty close to the original images. Notice, the reconstruction of 2 seems like a 3, this is due to information loss while compressing.

We can improve the autoencoder model by hyperparameter tuning and moreover by training it on a GPU accelerator.

All right, so this was a deep( or stacked) autoencoder model built from scratch on Tensorflow. For the full code click on the banner below.

Till next time!!

--

--

Associate Consultant — Data Science at Infosys | Ex-Lead Data Scientist at Senquire Analytics | UC Irvine graduate