GANs N’ Roses

Naresh Nagabushan
Towards Data Science
10 min readJun 29, 2017

--

“This article assumes the reader is familiar with Neural networks and using Tensorflow. If not, we would request you to go through this great article on Deep Learning by Michael Nielsen and familiarize yourself on using Tensorflow.”

Imagine one day wherein we had a neural network which could watch movies and generate it’s own movies, or listen to songs and compose new ones. This network would learn from what it sees and hears without you explicitly telling it. This way of letting a neural network learn is known as unsupervised learning.

GANs (Generative Adversarial Networks) which are in fact trained in an unsupervised way have gained a lot of traction in the last three years and are now considered as one of the most hottest topics in the field of AI. This is what Yann LeCun, the director of Facebook AI, thinks about them :

Generative Adversarial network is the most interesting idea in the last ten years in Machine Learning.

GANs are neural networks that dream, generating images after looking at some of them. Well, what can this be used for? And why is this important?

Generated bedroom images. Source: https://arxiv.org/abs/1511.06434v2

Until recently, neural networks (Convolutional Neural Networks in particular) were only good at classification tasks such as classifying between a cat and a dog or a plane and a car. But now they can be used to generate pictures (even though they look weird) of cats or dogs, which tells us that they have learnt features. This shows us that they are able to understand the features of an object on it’s own.

This extraordinary ability of GANs can be used in a lot of amazing applications such as:

  • Generating an image given a textual description. Check out this link:
Text to Image. Source: https://arxiv.org/pdf/1605.05396v2.pdf
  • Image to Image translation:

This might be the coolest application of a GAN yet. Image to Image translation can be used generate realistic looking images from sketches, convert a picture taken during the daytime to a night time image or even convert a grayscale image to a color image.

Image to Image Translation. Source: https://phillipi.github.io/pix2pix/

Checkout this link for more details:

Having gotten an idea of what GANs are capable of let’s all aboard the hype train and implement a simple GAN to generate images of roses. Okay, wait but why roses?

We don’t wanna bore you with a story, but let’s just say this article was inspired after listening to a song by Guns N’ Roses (get the title now??)

Let’s look at what GANs really are:

Before we begin building our GAN lets understand how it works. A Generative Adversarial Network contains two Neural Networks a Discriminator and a Generator. A Discriminator is a Convolutional Neural Network (don’t know what a CNN is? Check out this excellent post) that learns to differentiate between a real and a fake image. The real images are taken from a database and fake ones come from the Generator.

Discriminator

The Generator works like a CNN running backward, it takes a vector of random numbers as inputs and generates an image at the output.

Generator

We’ll get into the working and implementation of both the Generator and the Discriminator later, but for now let’s look at a famous example to explain GANs (Explanation borrowed heavily from Abusing Generative Adversarial Networks to Make 8-bit Pixel Art).

Let’s think of the Generator as a counterfeiter and the Discriminator as a police officer who has to differentiate between a fake and a real currency. Initially, let’s assure that both the counterfeiter and the police officer are equally bad at their jobs. So, the counterfeiter first produces some random noisy looking images because it does not know anything about how currency looks like.

Noisy Image by the Counterfeiter

Now the police officer is trained to differentiate between these fake images produced by the counterfeiter and the real currency.

Train the Police Officer

The counterfeiter now comes to know that its images are being classified as fake and that the police officer is looking for some distinct characteristics (such as the color and patterns) in the currency. The counterfeiter now learns these characteristics and generates currency (images in this case) which have them.

Training the Counterfeiter

Now, the police officer is again shown the real currency from the dataset and the new improved (hopefully) images from the counterfeiter and asked to classify them. The officer therefore would have learnt some more characteristics of the real image (like the face on the currency).

Train the Police with the new fake images

The counterfeiter again learns these features and produces better looking fake images.

Train the counterfeiter again

This constant battle between the counterfeiter and the police officer continues until the counterfeiter produces images that look exactly like the real ones and the police officer cannot classify them.

Real or Fake?

Implementation of GANs N’ Roses on Tensorflow:

Let’s build a simple DCGAN (Deep Convolutional Generative Adversarial Network) using tensorflow and nothing else (except for pillow).

But what’s a DCGAN?

DCGAN is a modified version of the vanilla GAN to address some of the difficulties with vanilla GAN such as: making the fake images look visually pleasing, improvement in the stability during the training process such that the generator won’t find a flaw in the discriminator by repeatedly outputting an image that fits the data distribution the discriminator is looking for, but is nowhere close to the real image.

This is the Discriminator architecture we’re trying to build:

Discriminator Architecture

It can be seen that it takes in an image as the input and output a logit (1 for true class and 0 for fake class).

Next, we have the Generator architecture, which consists of conv_transpose layers that takes in a set of random numbers as input and generates an image at the output.

Generator Architecture

The changes proposed by DCGANs taken directly from this paper are:

  • Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
  • Use batchnorm in both the generator and the discriminator.
  • Remove fully connected hidden layers for deeper architectures.
  • Use ReLU activation in generator for all layers except for the output, which uses Tanh.
  • Use LeakyReLU activation in the discriminator for all layers.

Let’s start by collecting images of roses. One easy way to do this is to image search roses on Google and download all the images in the search results by using a chrome plugin such as ImageSpark.

We’ve collected 67 images (more would have been better) which are available here. Extract these images in the following directory:

<Project folder>/Dataset/Roses.

The code and the dataset can be obtained by cloning this repo on Github.

Now that we have our images the next step is to preprocess these images by reshaping them to 64 * 64 and scaling them to a value between -1 and 1.

We’ll begin by writing out functions that can later be used to build convolution, convolution transpose, dense fully connected layer and LeakyReLU activation (as it isn’t available on Tensorflow yet).

Function to implement convolutional layer

We use get_variable() instead of the usual Variable() to create a variable on tensorflow to later share the weights and biases among different function calls. Check out this post to know more about shared variables.

Function to implement convolution transpose
Function to implement dense fully connected layer
Leaky ReLU

The next step is to build the Generator and the Discriminator. Let’s first begin with our protagonist, the Generator. The Generator architecture we’ll need to construct is shown below:

Again, the Generator Architecture we’re trying to implement

The generator() function builds a Generator (dah!) with the architecture that’s in the above figure. The DCGAN requirements such as removing all fully-connected layers, using only ReLU at the Generator and using batch normalization have been satisfied.

Similarly the Discriminator can easily we constructed as follows:

The architecture required:

The Discriminator architecture

Again we’ve avoided dense fully-connected layers, used Leaky ReLU and batch normalization at the Discriminator.

Now for the fun part, training these networks:

The loss functions for the Discriminator and the Generator is shown below:

Discriminator loss (This must have a negative sign)
Generator loss

Where x represents the real images and z is the noise vector feed to the Generator.

We’ll pass the random inputs to the Generator, the shape of zin will be [BATCH_SIZE, Z_DIM], the Generator should now give BATCH_SIZE number of fake images at its output. The size of the Generator output will now be [BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, 3]. This is the term G(z) in the loss function.

D(x) is the Discriminator that takes in the real images or the fake ones and is trained to differentiate between them. In order to train the Discriminator on the real images we’ll pass the real image batch to D(x) and set the target to 1. Similarly to train it on fake images (which come from the Generator) we’ll connect the Generator output to the Discriminator input using D(G(z)).

The loss for the Discriminator is implemented using tensorflow’s inbuilt functions:

We’ll next need to train the Generator such that D(G(z)) will output a one, i.e, we’ll fix the weights on the Discriminator and back prop only on the Generator weights such that the Discriminator outputs a one always.

The loss function for the Generator will therefore be:

We’ll next collect all the weights of the Discriminator and the Generator (this is later needed to train either only the Generator or the Discriminator):

We’ve used the tensorflow’s AdamOptimizer to learn the weights. The next step is to pass the weights that need to be modified to the Discriminator and the Generator optimizers respectively.

The last step would be to run the session and pass the required image batches to the optimizers. We’ll train the model for 30000 iterations and periodically display the Discriminator and the Generator losses.

In order to make the tuning of hyperparameters easier and to save the results at each run we’ve implemented the form_results function and created a file called mission_control.py.

All the hyper parameters for the network can be modified using the mission_control.py file and later running the main.py file will automatically create folders for each run and save the tensorboard files and the generated images.

We can have a look at the Discriminator and the Generator loss at each iteration during training by open up tensorboard and pointing it to the Tensorboard directory created under each run folder (Checkout the GitHub link for more details).

Variation of Generator loss during training
Variation of Discriminator loss during training

From these graphs it can be seen that Discriminator and the Generator losses constantly increase and decrease during the training stage indicating that both Generator and the Discriminator try to out perform each other.

The code also saves the generated images for each run and some of these images are shown below:

At the 0th iteration:

100th iteration:

1000th iteration:

The images are being over-fitted at the 30000th iteration:

The generated images at the training stage is shown below:

These images are promising, but after about 1000 iterations it can be seen that the Generator is just reproducing images from the training dataset. We can use a larger dataset and train it for lesser number of iterations to reduce overfitting.

GANs are easy to implement but hard to train without the right hyper-parameters and network architecture. We’ve written this article with the main intent to help people get started with Generative Networks.

What others are doing with GANs:

Thank you for reading this rather long medium post. If you find it helpful please consider sharing it. Feel free to contact us Naresh Nagabushan or monan with any doubts pertaining to this article.

--

--