Deep Learning Fundamentals

Introduction
In my previous article I talked about generative methods in general and what makes generator networks so powerful. In this article, I want to go more in-depth into the inner workings of Generative Adversarial Networks (GANs). To make sure you understand them fully, I will go through the original GANs paper pseudo-code and explain the loss functions of GANs, then I will show you the results of my own implementation. Finally, I’ll explain how GANs can be improved through the recommendations of the DCGANs paper, a pivotal paper in this field of research.
How GANs Learn
Generative methods are a very powerful tool that can be used to solve a number of problems. Their goal is to generate new data samples that are likely to belong to the training dataset. Generative methods can do this in two ways, by learning an approximate distribution of the data space then sampling from it, or by learning to generate samples that are likely to belong to this data space (avoiding the step of approximating the data distribution).

Above you can see a diagram of the architecture of Gans. GANs consist of two networks (generator and discriminator) that are essentially competing against each other; the two networks have adversarial goals.
The generator attempts to maximize the probability of fooling the discriminator into thinking its generated images are real. The discriminator’s goal is to correctly classify the real data as real, and the generated data as fake. These objectives are expressed in the loss functions of the networks, which will be optimized during training.
In GANs, the generator’s loss function is minimized, and the discriminator’s loss function is maximized. The generator attempts to maximize the number of samples that the discriminator’s false positives, and the discriminator attempts to maximize its classification accuracy.
![Pseudo-code by Ian J. Goodfellow et al., Generative Adversarial Networks [1]](https://towardsdatascience.com/wp-content/uploads/2021/11/1irndtX3ZVpTX5Mq2zMQKQQ.png)
In the pseudo-code above, for each epoch, for every batch, the gradients of the discriminator and generator are computed. The discriminator’s loss consists of the logarithm of the number of correctly classified samples from the real dataset, and the number of correctly classified samples from the fake dataset. We want to maximize this. The generator’s loss functions consists of how many times the discriminator correctly classifies the fake images, we want to minimize this.
The goals are naturally opposite, and therefore so will be the gradients used to train the networks. This can become a problem, and I will discuss this later.
Once the training is complete, the generator is all we care about. The generator is capable of taking in a random noise vector, then this one will output the image that is most likely to belong to the training data space. Remember that even though this is effectively learning a mapping between the random variable (z) and the image data space, there is no guarantee that the mappings between the two spaces will be smooth. GANs do not learn the distribution of the data, they learn how to generate samples similar to those belonging to the training data.
Application
Let’s look at an application of a simple GAN. The training data consists of handwritten digits from the MNIST dataset. Say that I needed more handwritten digits to then train other machine learning / statistical models, then GANs can be used to attempt to generate more data.

Take a look at this training data. As you can see, some numbers are even hard to read for humans. Machines really struggle with unstructured data (typical examples: images and text). A computer only sees pixel values, and it is difficult to teach it what arrangement and order make up a handwritten digit.
Generating Random Samples

As you can see, the generated data looks like hand written digits. The model has learnt some of the patterns in the images of handwritten digits, and with this it has been able to generate new data.
We can take a look at a GIF of the generated samples as the model learns (the GIF might not show on the medium app so I recommend using a browser to see it).

The generator model starts off with random weights and the generated images look like random noise. As the loss functions are optimized, the generator gets better and better at fooling the discriminator, eventually producing images that look a lot more like handwritten digits than random noise.
Sampling between Images
One interesting thing that can be done is to take two generated images and sample the space between them. The two generated images have two corresponding random vectors that have been fed to the generator. I can discretize the space between these two vectors, and essentially draw a line through the distribution between these two. I can then sample this line, and input the points to my generator.

Here I am looking at the space between a generated "4" and a generated "8". The beginning number is the "4" on the top left corner, this number will essentially transform into the "8" on the bottom right corner.
Even though the generator has no guarantee of producing a smooth mapping between the input space and the data space (we are generating sample not approximating density functions), you can see the transition between the two images is still quite smooth. It is also interesting to see that before the "4" turns into an "8", it first transitions through a "9", meaning in the data space the number "9" lives between "4" and "8".

This is another example, this time number "6" is turning into "3". You can see briefly the generated samples look like a "5".
The Drawbacks of GANs

A major disadvantage of GANs is that as mentioned earlier, both the discriminator and generator have opposite objectives and therefore opposite sign gradients. It can be shown that when optimizing a GAN, a minimum will not be achieved. Instead, the optimization algorithm will end up in a saddle point.
Another common problem with GANs is that when training these models, it is easy for the discriminator to overpower the generator. The discriminator simply gets too good too quickly and the generator is unable to learn how to generate images that fool the discriminator. Intuitively this makes sense, a classification task will always be easier than the generator’s task of learning how to generate new samples.
DCGANs and Vanishing Gradients
Deep Convolutional GANs are a proposed ways to tackle this issue. The first major recommendation is to use LeakyReLU as an activation function for the discriminator. This helps combat the vanishing gradients problem. The vanishing gradient problem occurs when training any kind of neural network, if the gradients are too small they can get "stuck" in this vanishing state, and it is difficult for them to be used later in the training as they are close to zero. LeakyReLU makes this less likely to occur by always updating the weights of the model, even when the activation is small.

Other tips in the DCGANs paper have become common practice when training neural networks, such as using batch normalization layers after convolutional layers, and avoiding using too many dense layers, and using convolutional layers instead.
Conclusion
In this article, I go through the theory of how GANs work and how they learn. I then show the results of a simple implementation of GANs in python. Finally, I highlight some of the drawbacks of GANs, and how these can be tackled.
Support me
Hopefully this helped you, if you enjoyed it you can follow me!
You can also become a medium member using my referral link, get access to all my articles and more: https://diegounzuetaruedas.medium.com/membership
Other articles you might enjoy
Differentiable Generator Networks: an Introduction
Fourier Transforms: An Intuitive Visualisation
References
[1] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y., 2014. Generative Adversarial networks. Cornell University https://arxiv.org/abs/1406.2661.