Python-PyTorch

Understanding ACGANs with code[PyTorch]

Working with PyTorch Library and Python to build and know more about an ACGAN model

Hmrishav Bandyopadhyay

Published in

Towards Data Science

5 min readApr 21, 2020

ACGAN stands for Auxiliary Classifier Generative Adversarial Network. The network was developed by a group of researchers from google brain and was presented in the 34th international conference of machine learning held at Sydney, Australia. This article is a brief description of the research work enunciated in the paper and implementation of the same using PyTorch.

Why ACGAN?

ACGAN is a specialized GAN that can work wonders in image synthesis. You tell it the class label and it will be able to generate the image — all from complete noise! Another important feature of ACGAN is that it generates images which are considered quite high resolution as compared to the previous approaches. But any image can be resized with bi-linear interpolation and its size can be increased. True they can, but they would remain blurry versions of the low-resolution images that are no more discriminable than them. ACGAN is the first to bring forth the idea of checking image discriminability using a pre-trained Inception network.

[Note — The PyTorch model built here as an implementation of the ACGAN paper contains just the generator and the discriminator. Take it upon yourself to check the model against the pre-trained Inception Network. Let me know in the comments about the results :)]

Architecture

The ACGAN consists of a generator and a discriminator, as any GAN network does. However, in the ACGAN, every generated sample has a corresponding class label c ~ C(available classes) in addition to the noise z. This class label helps the model to synthesize the image based on the label passed. The generator G uses both the class label and the noise z to generate the image. The Generated images can be represented as —

Image generated ( ***fake*** because it has to be labelled as fake by discriminator)

The generator architecture is simple enough. It consists of series of deconvolutional layers, also known as transposed convolutional layers. Confusing? Let me break it up for you —

A transposed convolutional layer is the same as a convolutional layer, but with added padding to the original input. As a result of this, when the convolution is applied with stride 1 and no padding, the height and width of the output is more than that of the input. With a stride of 2, we can perform up-sampling with a transposed convolutional layer — same as we can perform down-sampling with a convolutional layer of stride 2.

The transposed convolutional layers of the generator is backed up with ReLU non-linearity.

The discriminator contains a set of 2d convolutional modules with leaky-ReLU non-linearity followed by linear layers and a softmax and sigmoid function for each of its outputs — detecting the class and the source of the model. The overall model can be drawn as —

Now that we have somewhat defined the generator and the discriminator by their architecture models, we will arrive at the loss function.

Loss Function

The loss function of the ACGAN is divided into two parts —

The log likelihood for the source being checked—

Source-loss

2. The log likelihood of the class being checked—

Class-loss

As is evident from the above loss functions, the generator and the discriminator ‘fight’ over this loss function. The generator and the discriminator both try to maximize the class-loss. The source-loss is however a min-max problem. The generator tries to minimize the source-loss and fool the discriminator. The discriminator on the other hand tries to maximize the source-loss and tries to prevent the generator from gaining an upper hand.

The model [PyTorch]

Since we have completed the analysis of the ACGAN paper, we will now build the model with the CIFAR10 dataset. You can download the dataset from here. In fact, I will incorporate the download in the training itself, to get rid of the hassle! The dataset contains 60,000 images of 32x32 dimensions. There are 10 classes, each having 6,000 images.

The Generator, written as a module —

The Generator module for ACGAN

Note that in the generator the convolution networks have carefully chosen parameters so that the output tensor has the same dimension as the tensor coming from the training set. This is necessary because both go to the discriminator to be evaluated

The Discriminator, also written as a module —

Discriminator model

Now, lets start training!

For training, I have set the number of epochs to 100. The learning rate has been set to 0.0002 and the batch size is set to 100.

The number of epochs ideally should be more for proper image synthesis. I have set it to 100 as an example

Training model

Results!

Let’s check out the results from this little experiment of ours —

Image on first epoch (noise) —

Image on the last epoch —

Quite an improvement, huh? The beauty of GANs is that you can see the model training through the images. You can see the structures taking shape across epochs as the model slowly learns the distribution! So what are you waiting for? Make your own code for the model; solve the problem your own way. Improvise the solution if you can — and we will see if we can name it after you ;)

Let me know in the comments if you get stuck! Here to help :)

Check out my blog for faster updates and don’t forget to subscribe for more quality content :D

https://www.theconvolvedblog.vision/