If you enjoy today’s reading, PLEASE give me a follow and let me know if there is another topic you would like me to explore! If you do not have a Medium account, sign up through my link here! Additionally, add me on LinkedIn, or feel free to reach out! Thanks for reading!
Introduction
I recently concluded a Computer Vision study and discovered a really interesting outcome of my deepfake images: The architecture of the GANs created varying levels of "quality" in the output of their deepfakes. Quality in this defintion can be linked to the difference between taking a picture with a low-pixel lens on a cell phone to a more sophisticated setup used by a professional photographer.
Generative Adversarial Networks (GAN)
You may be wondering, what is a Generative Adversarial Network (GAN)? Simply put, a GAN has two sub actors, called the discriminator and generator, that interact with each other. These continuous interactions lead to an output of synthetic data. Not only is the data that is created artificial, but more importantLY it realistically represents the distribution and features of the real data it is trying to mimic. Check out the deepfake images I created below. To the human eye it’s very difficult to tell the difference between fake and real! GANs can help with balancing datasets as well as providing unique samples to your data to increase the diversity of your dataset.
Less Technical Deep Dive
As I stated, a GAN is a two-part network that leads to an output of sythetic data once it is trained. Remember, a "good" GAN produces data that we can not tell is fake or real!
*Note: While I am stating a GAN will produce images, it can also produce tabular data!
Discriminator
The job of a discriminator in a GAN is to determine if an image is real or fake. We give the discriminator real images while the fake images come from the generator. Over time the discriminator learns how to better detect the difference between more realistic-looking fake images and real images.
Generator
The job of the generator is to create fake images, known as Deepfakes. Generally speaking, the generator will be told after each pass in the training of the model how realistic its images were based on what the discriminator labels its samples (if a lot of the samples come back labeled "real," the generator will keep making images that align with those images since they tricked the discriminator). As the generator becomes more capable at tricking the discriminator more realistic-looking data and images are created. The generator is building samples to give to the discriminator through resources given to it.
More Technical Deep Dive
As stated, a GAN is made up of two models: a discriminator and a generator. It is a min-max game where the output is realistic synthetic data.

The goal of the generator is two minimize the function above (ie. send the function to 0) while the goal of the discriminator is to maximize the function above (ie. send it to 1). This function represents a Nash Equilibrium problem where the discriminator and generator will hopefully reach equilibrium together. In model training, it is important that neither the discriminator nor the generator are stronger than the other so they can become better at detecting and generating a fake image together.
Discriminator

The discriminator hopes to maximize the log probability that an image is real minus the log probability that the inverse is true. This makes sense mathematically. The algorithm is penalizing the discriminator anytime it mislabels a fake image to be real. When training the discriminator, it is recommended to give batches of real and fake samples, rather than individual mixed samples, for more added stability.
Generator

Now, before I had stated that the generator is trying to minimize the overall function, but mathematically this function can be represented as a maximization function by looking at the probability an image is fake instead of real. The generator wishes to maximize the log-likelihood that an image is fake.
You might be wondering, how does the GAN create fake data? It does this when the generator is provided random noise from a vector space. Over time, the generator will learn to map different features to that vector space and pool those features together to create the realistic data we get from its output. The generator is literally building data from basically nothing!
Type of GANS I used
Since I used GANs that created deepfake images, the discriminator and generator networks were Convolutional Neural Networks (CNN). Read more about CNNs below!
Convolutional Neural Networks: From An Everyday Understanding to a More Technical Deep Dive
The discriminator is your typical CNN, while the generator is a transpose CNN that will upsample and create deepfake images over time. The variations of GANs I used were a Deep Convolutional GAN (DCGAN) (find out more about this variant [here](https://arxiv.org/abs/1701.07875)), a conditioned DCGAN (cDCGAN), a Wasserstein GAN (WGAN) (find out more about this variant here), and a conditioned WGAN (cWGAN).
I won’t get into the nitty-gritty of each variation, but the DCGAN is the least stable of all of the GAN variants. GAN models are generally unstable in nature and can be difficult to train. By conditioning a GAN, in this case I conditioned a GAN on the image labels, we can acquire a higher quality output. The Wasserstein GAN is more stable than the DCGAN with various changes to its architecture including adopting the Earth Mover Distance metric for the models training function which looks at how similarity between two probability distributions.
Results
One of the objects I tried to recreate was a car:

Interestingly enough the quality of the car images was just okay and the GAN was actually about to clear away some of the noticeable blurriness.

Notice from left to right how the quality of images improves. In the DCGAN, you can see signs of RBG noise with lots of blurring. The RGB and blurring slowly dissipate over time as we progress from left to right and the cWGAN variant was able to produce the images of the highest quality. All GANs were trained for the same amount of epochs. This shows how GAN stability can have a direct impact on how "pure" and realistic your deepfake images will look.
Another interesting finding is how the WGAN variants produce higher quality and more clear images than the originals . This could be due in part to the fact that the added stability in the architecture allowed to GAN to capture quality pixel features and produce more realistic deepfakes. Additionally, the WGANs produced much more realistic-looking shadows.
Conclusion
If you ever plan to use a GAN in your research, I highly recommend using the WGAN variant and conditioning on a feature of your dataset (it’s really easy to just condition the GAN on the images’ labels using an embedding vector). I hope this study showed you how stability in any model really does matter and can produce much stronger outputs. My findings showed how different GAN variants will produce different quality images to the point where it looks as if the deepfakes were taken with different types of cameras. These findings can come in handy when you want to balance your dataset or increase its diversity of it to produce a more generalizable and higher-performing model. Thanks for reading!