Getting Started
I had the opportunity to do a 3-month research internship on Gans. I read a lot of scientific papers as well as blogs. In this post, I try to convey the basics of what I learned and feel worth sharing.
Table of contents
1) Introduction 2) How do GANs work? 2.1) The principle: generator vs discriminator 2.2) Mathematically: the two-player minimax game 3) Why are GANs so interesting? 4) Conclusion and references
![Figure 1: Realistic yet fictional portraits of celebrities generated from originals using GANs. Source: Nvidia [4].](https://towardsdatascience.com/wp-content/uploads/2020/10/19RZd1Gk5kMgM0GymWP8YGg.png)
1) Introduction
Over the past decade, the explosion of the amount of available data – Big Data – the optimization of algorithms and the constant evolution of computing power have enabled artificial intelligence (AI) to perform more and more human tasks. In 2017, Andrey Ng predicted that AI will have a profound impact as electricity did.
If we claim that the purpose of AI is to simulate human intelligence, the main difficulty is creativity. In the field of AI, we talk about generative models and one of the most popular model nowadays is GANs (for "generative adversarial networks"). In a 2016 seminar, Yann LeCun has called GANs "the coolest idea in deep learning in the last 20 years".
GANs [1] introduce the concept of adversarial learning, as they lie in the rivalry between two neural networks. These techniques have enabled researchers to create realistic-looking but entirely computer generated photos of people’s faces. They have also allowed the creation of controversial "deepfake" videos. Actually, GANs can be used to imitate any data distribution (image, text, sound, etc.).
An example of GANs’ results from 2018 is given Figure 1: these images are fake yet very realistic. The generation of these fictional celebrity portraits, from the database of real portraits Celeba-HQ composed of 30,000 images, took 19 days. The generated images have a size of 1024×1024.
2) How do GANs work?
Generative adversarial networks (GANs) are a generative model with implicit density estimation, part of unsupervised learning and are using two neural networks. Thus, we understand the terms "generative" and "networks" in "generative adversarial networks".
2.1) The principle: generator vs discriminator
![Figure 2: Roles of the generator and the discriminator. Source: Stanford CS231n [2].](https://towardsdatascience.com/wp-content/uploads/2020/10/1uzJMtPVXbmmgbr_-wSYfg.png)
The principle is a two-player game: a neural network called the generator and a neural network called the discriminator. The generator tries to fool the discriminator by generating real-looking images while the discriminator tries to distinguish between real and fake images. Hence, we understand the term "adversarial" in "generative adversarial networks". See Figure 2.

At the bottom left of Figure 2, we can see that our generator samples from a simple distribution: random noise. The generator can be interpreted as an artist and the discriminator as an art critic. See Figure 3.

During training, the generator progressively becomes better at creating images that look real, while the discriminator becomes better at telling them apart. The process reaches equilibrium when the discriminator can no longer distinguish real from fake images. See Figure 4. Thus, if the discriminator is well trained and the generator manages to generate real-looking images that fool the discriminator, then we have a good generative model: we are generating images that look like the training set.
After this training phase, we only need the generator to sample new (false) realistic data. We no longer need the discriminator. Note that the random noise guarantees that the generator does not always produce the same image (which can deceive the discriminator).
Note that at the beginning of the training in Figure 4, the generator only generates a random noise that does not look like the training data.
2.2) Mathematically: the two-player minimax game
The generator G and the discriminator D are jointly trained in a two-player minimax game formulation. The minimax objective function is:

where _θg is the parameters of G and _θd is the parameters of D.
In the following, we simply refer to D{θd} as D and G{θg} as G.
By definition, D outputs the likelihood of real image in interval [0, 1]: • D(x) equals 1 (or is close to 1) if D considers that x is a real data, • D(x) equals 0 (or is close to 0) if D considers that x is a fake data (e.g. a generated data).
We can prove that, at the equilibrium, D outputs 1/2 everywhere because D has no idea how to distinguish fake generated data from real data.
Because x ∼ p{data}_, x is a real data. By definition of G, G(z) is a fake generated data. For example, x would be a real-life image of a cat and G(z) would be a fake generated image of a cat. Thus, D(x) is the output of the discriminator for a real input x and D(G(z)) is the output of the discriminator for a fake generated data G(z).
Following [1], the two-player minimax game from Equation (1) was written such that _θg and _θd evolve so that the following points from subsection 2.1) are true: • The discriminator D tries to distinguish between real data x and fake data G(z). More precisely, the discriminator D plays with _θd (_θg being fixed) to maximize the objective function such that D(x) is close to 1 (x being real data) and such that D(G(z)) is close to 0 (a generated data is detected as false). • The generator G tries to fool the discriminator D into thinking that its fake generated data is real. More precisely, the generator G plays with _θg (_θd being fixed) to minimize the objective function such that D(G(z)) is close to 1 (a false generated data is detected as true by the discriminator).
Although we are in unsupervised learning (the data is not labeled), we choose that the data generated by G has a 0 label for false (regardless of what the discriminator returns) and the real learning data has a 1 label for true. We can thus define a loss function.
Paper [1] proves that the minimax game has a global (and unique) optimum for _pg = p{data}_ where _pg is the generative distribution and p{data}_ the real data distribution. However, in practice, having _pg to converge towards p{data}_ is not easy.
3) Why are GANs so interesting?
Generative models have several very useful applications: colorization, super-resolution, generation of artworks, etc. In general, the advantage of using a simulated model over the real model is that the computation can be faster.
Many interesting examples are given in Goodfellow’s tutorial [3] and Stanford’s lecture [2]. In particular, examples given by Goodfellow in the conference "Generative Adversarial Networks (NIPS 2016 tutorial)", from 4:15 to 12:33, are impressive.
![Figure 5: CycleGAN: real images transposed into realistic fictional images using GANs. Source: Berkeley AI Research (BAIR) laboratory [5].](https://towardsdatascience.com/wp-content/uploads/2020/10/1Ap1aL0-wmznNK7tdiH3teA.png)
One example is given Figure 5. These real images are transposed into realistic fictional images – or vice versa – with the CycleGan developed by researchers at the University of Berkeley. The concept, called image-to-image translation, is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. There exists a tutorial on CycleGAN for Tensorflow.
![Figure 6: Several types of image transformations using GANs. Source: Berkeley AI Research (BAIR) Laboratory [6].](https://towardsdatascience.com/wp-content/uploads/2020/10/1qNEaI8j-rupiedmROmtoKw.png)
A second example is shown in Figure 6. For example, the aerial to map feature can be very useful to Google Maps or similar applications and the edges to photo feature can help designers.
4) Conclusion
GANs’ applications have increased rapidly, in particular for images. I believe that GANs can be very interesting for companies. For example, GANs can generate realistic images of new medical images and image-to-image translation can help designers draw and be more creative. Moreover, GANs can be used for data augmentation when we only have one hundred images and we wish to have more.
GANs have also been developed for binary outputs (sick or not) or discrete outputs (rounded blood pressure, rounded weight…) [7]. Benefits from this new research on tabular data are numerous, in particular for privacy purposes. For example, instead of sending confidential data from Excel sheets, hospitals can send fake realistic data (that keeps the correlation between the features) to their partners.
Thanks for reading, I hope you found this post useful. Don’t hesitate to give feedback in the comments. Feel free to reach out to me on Twitter.
References
[1] Ian Goodfellow, Jean Pouget-Abadie, M. Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville and Yoshua Bengio. "Generative Adversarial Nets." NIPS (2014). [2] Fei-Fei Li, Justin Johnson and Serena Yeung. "CS231n: Convolutional Neural Networks for Visual Recognition. Lecture 13 | Generative Models." Stanford University (Spring 2017). [3] Ian Goodfellow. "NIPS 2016 Tutorial: Generative Adversarial Networks." ArXiv abs/1701.00160 (2017). [4] Tero Karras, Timo Aila, S. Laine and J. Lehtinen. "Progressive Growing of GANs for Improved Quality, Stability, and Variation." ArXiv abs/1710.10196 (2018). [5] Jun-Yan Zhu, T. Park, Phillip Isola and Alexei A. Efros. "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks." 2017 IEEE International Conference on Computer Vision (ICCV) (2017): 2242–2251. [6] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou and Alexei A. Efros. "Image-to-Image Translation with Conditional Adversarial Networks." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017): 5967–5976. [7] E. Choi, S. Biswal, B. Malin, J. Duke, W. Stewart and Jimeng Sun. "Generating Multi-label Discrete Patient Records using Generative Adversarial Networks." MLHC (2017).