Neural networks aren’t limited to just learning data; they can also learn to create it. One of the classic machine learning papers is Generative Adversarial Networks (GANs) (2014) by Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, and others. GANs take the form of two opposing neural networks one learning to generate fake samples while the other tries to separate the real samples from the fakes.

Sophisticated GAN models like VQGAN and others generate everything from fake landscapes, faces, and even Minecraft worlds.
These are my notes on the classic paper that first introduced Gans.
Setup
GANs are fundamentally a two-part binary classification problem composed of two neural network models. The first model called the Generator takes an input consisting of just noise and creates an output of the same shape as the data it’s learning to fake. The other model, the Discriminator, takes both real and generated inputs and outputs a prediction that the data is either real or fake.
Loss
The key to understanding the game between generator and discriminator is the link between the value function and binary cross-entropy (BCE) loss.
This equation from the paper explains the objective for both. Ignoring the min/max portion for a moment, the rest of the equation describes the value function, V(D,G). The value function is the expected value of the log of the discriminator output, D(x), where x is drawn from real data plus the expected value of the log of 1 minus the output of the discriminator acting on the generated output, D(G(z)), where the input to the generator, z, is drawn from noise.
Now, looking at the min/max portion again, the objective of the generator is to minimize the value function while the objective of the discriminator is to maximize it.

But what does the value function have to do with BCE? Below is the equation for BCE loss where y is the true label and ŷ is the model prediction. So if the label for true data is 1 and the label for fake data is 0 the value function becomes the sum of the loss for fake data (y=0 and ŷ=D(G(z))) and the sum of the loss for real data (y=1 and ŷ=D(x)). Half of the equation cancels for each loss.

The minus sign disappears based on the min/max convention. The value function plotted below shows how. The generator’s objective is to get the discriminator to misclassify fake data as real and real data as fake corresponding to low (min) values for the blue and gold curves below. The discriminator’s objective is to correctly classify each sample which would have an overall value function of zero (max) on both curves.

Training
The two models are trained together. The code below shows an example in TensorFlow. First, the discriminator is trained by drawing a batch of noise and a batch of real data. The discriminator loss is computed as described earlier, and its weights are updated. Next, the generator is trained by drawing another batch of random noise and passing it through the generator. The generated data is classified by the discriminator, and the generator is scored by how well it fooled the discriminator. The generator weights are updated based on this gradient and the process repeats.
As both models train, the distribution of generated data (green line) shifts to look like the distribution of real data (black dotted line) until the discriminator prediction (blue dotted line) can no longer tell the difference and predicts a 50% probability for both real and fake input.

Here are a few samples of generator output at various stages of training. I used the MNIST number dataset to work through building the GAN. Over time the discriminator becomes less and less confident in its predictions of which samples are real and which are fake as the generator begins to produce images that look like the training set.

GANs are tricky to train. For example in this test case using a single example from MNIST, the generator quickly learned to replicate the example number 5 after a few hundred steps. However, as training continued the generator output quickly diverged. The generator can also diverge back to noise with a learning rate that is too high.

Managing the balance between both networks can also be difficult. If the discriminator is too good, it can cause training problems so it may be necessary to adjust parameters in the training setup.
State-of-the-art GANs are incredibly powerful at generating realistic content and can be used with both good and poor intentions. Hopefully, these notes were helpful for understanding the basics of how they work.
You can find the notebook I used to learn about GANs here.