The world’s leading publication for data science, AI, and ML professionals.

GANs: A Different Perspective

An intuitive and hassle-free introduction to Generative Adversarial Networks

Generative Adversarial Networks (GANs) are one of the hottest topics in the modern AI field. In this article, we will look at GANs from a different perspective i.e., rather than viewing GANs as a generator of beautiful images but as a probability distributions transformer function. We will explore the GAN’s core idea without getting ourselves tangled with implementation and complicated mathematics. We start by analyzing the type of problem we have at hand. Then we perceive how the requirements of the solution shape the GAN idea.

Welcome to the Amusement Park!

Assume that we own an amusement park. In this park, we have a machine that takes 10 dollars in and randomly returns an item with a value between 1 dollar to 100 dollars. The machine is very popular among visitors since they win very cool and expensive stuff from time to time. Additionally, the machine is quite profitable for us. Hence, the machine’s giveaway selection logic hit exactly the sweet spot where it secures both our and customers’ satisfaction.

Image by Author
Image by Author

Consequently, we want to add more of this machine to gain even more profit. However, there is one problem. The machine is super expensive. Hence, we are interested to build our machine. To do so, we need to figure out the machine’s item selection logic. Obviously, the key argument for selecting an item is its value. If an item is expensive it should be less probable to be selected to guarantee our profit. However, if we decrease the probability of selecting expensive items aggressively, it will result in visitors’ dissatisfaction. Therefore, our goal is to learn the probability distribution of items’ value as precisely as possible. To start with, we have a list of the previous machine’s giveaways with their corresponding prices. First, we attempt to look at the distribution of giveaways. If the distribution were similar to a well-known probability distribution the problem is solved. We use that probability distribution as the heart of our new machine’s item selection logic. We sample from this distribution to determine which item to return.

Image by Author
Image by Author

A Complex Machine, A Complex Problem

However, if we encounter a complex distribution of giveaways, we need to devise a method to learn the probability distribution of a generative process given only the samples from the distribution.

Image by Author
Image by Author

In other words, we need a model that looks at our data and figures out the machine logic. The main takeaway is that "learning the probability distribution of data is the main task of the data generation."

Generation with Transformation

Let’s portray our goals on an abstract level. To begin with, we have a set of data that we call them real data from this point on. Our goal is forging artificial data which are similar to real data. The artificial data is normally referred to as fake data. Therefore, we need a model that looks at real data and generates realistic fake data. Pretty clear target. Now, We need to move from our abstract target toward a more detailed description of our task and desirably connect it to something more familiar. To do so, we need to change our perspective on the problem. First, we need to get familiar with the transform function. Assume that we have a set of samples from a probability distribution. By applying a transform function we can transform these samples from their original distribution to the desired target distribution. Theoretically, we can transform from any source distribution to any target distribution. However, calculating these transform functions is not always analytically possible.

Now, let’s go back to our problem. We can reframe our generation problem as a transformation task. We start with a known distribution. Normally, we select a Gaussian distribution with mean 0 and standard deviation 1. We call this distribution "latent space". Now, we need to define a transform function that transforms samples from our latent space to data space. In other words, our transform function takes a sample from the latent space and outputs a sample in the data space aka a data point. And Voilà! We generate data! There is only one problem. It is not possible to define this function analytically. However, don’t we use Neural Networks to approximate complex functions that are impossible to define analytically? Yes, we do and that’s exactly what we are going to do. We use a neural network to approximate our transform function. We call this neural network "Generator" since, in the end, it will generate data. Quite sensible.

Image by Author
Image by Author

Now that it comes to using a neural network, we need to define a loss function to train our network. The loss function is the key to proper training and realistic data generation. Therefore, we need to define it precisely in alignment with our goals.

Discriminator: Critically Helpful

In general, the loss function evaluates how well our neural network is performing regarding our goal and provides feedback (in the form of gradients) to the model to improve itself. Here, we need a loss function to measure how well our generated data follow real data distribution. In other words, we want a loss function that can tell us how realistic our fake data are. Still, we don’t have any information about real data distribution. This was our main problem from the beginning. However, we can achieve the same goal using discrimination between real data and fake data.

Assume our loss function can differentiate between real data and fake data.

Image by Author
Image by Author

Then, we can provide our fake data to this function. For those fake samples which are indistinguishable from real data, we don’t need to do anything. For the other fake samples, the loss function would provide feedback to update and improve our generator.

Image by Author
Image by Author

In more concrete terms, we can use a classifier as the loss function that can classify between real and fake data. If a generated data point is classified as real, it means it is similar to real data, and we do not need to take any further action. For those fake samples which are identified as generated data, we would ask the loss function how should we update our generator to make these sample look more realistic. The loss function provides the answer in the form of a gradient to update weights in our neural network.

It seems we have found the final piece of our solution! However, we have to take care of yet another problem. While our suggested loss function satisfies our requirements, it is not straightforward to implement it in practice. Hence, our loss function is a complex function that we can define its characteristics, but we cannot directly implement it. Look like a dead end. But what holds us back to approximate this loss function using a neural network? Nothing! So, let’s do it. We can use a classifier neural network as our loss function. We call this network "Discriminator" because it discriminates between real and fake data. Very sensible naming.

Image by Author
Image by Author

Best of all, we are quite familiar with using neural networks for classification. We know how to train them, their loss function and how should their input and output look like. However, training two neural networks simultaneously is not something conventional. Now, the final question is, how should we train all of these networks together.

Let the Train Begin!

Image by Author
Image by Author

If we had the perfect classifier before we start training our generator, our training would become very straightforward. Unfortunately, at the beginning of the training process, our discriminator is as clueless as our generator. Even worse, we cannot train the discriminator before starting to train the generator since we need fake data to train the discriminator. As you can see, these two networks dependent on each other for training. The generator needs feedback from the discriminator to improve and the discriminator needs to be kept updated with improvements of the generator. Therefore, we train them interchangeably. For one batch, we train discriminator to classify real and fake samples. Then for one batch, we train the generator to generate samples that are identified as real by the discriminator. This method is called "Adversarial training". When we use adversarial training for the data generation task, we get Generative Adversarial Network or in short GAN.

However, when we look at the training procedure, we do not see "adversary" per se. To find out where does the term "adversarial training" comes from, we should look closely at the objectives of two networks. The goal of the discriminator is to classify real and fake data as accurately as possible. Therefore, in the discriminator training phase, the discriminator attempts to identify the fake samples correctly. On the other hand, we train the generator to generate realistic fake data. To pass the authenticity test, the generator should convince the discriminator that its generated data are real. In other words, the generator tries to fool the discriminator while the discriminator attempts to not be fooled by the generator. These contradicting objectives set the training process into motion.

During the training, both networks improve with regard to their goals. Finally, at some point, the generator becomes so good that the discriminator cannot differentiate between fake and real data and that’s the point where we are done with training.

Image by Author
Image by Author

Pitfalls. A Lot of Them!

GANs are a beautiful yet complex solution for a very difficult problem. With GAN, we have a fast, efficient, and precise answer for a long-lasting problem and it paves the way for many thrilling applications. However, before jump into the application, we should be aware of the GANs’ common problems. First of all, the generator is a neural network and by definition, it is a black-box. While a trained generator embeds information regarding real data distribution into its weight, we do not have access to it explicitly. When we are dealing with low dimension data, we can retrieve this information via sampling but for higher dimensionalities, we cannot do anything. Furthermore, unlike other neural networks, the GANs loss function provides little information about training progress. During the training, we need to check samples of generators manually to inspect training progress. Finally, as previously mentioned, the training takes place through the fight between generator and discriminator. If they stop fighting each other, the training process will step, and unfortunately, they often stop fighting after some time. Many reasons can contribute to this problem. For example, if one of the networks improves much faster than the other one, it can overpower the other network and the training would stop. Thus, network architectures should be balanced. But what does the balance mean? There is no straightforward answer to this question. Normally, one should find them through trial and error. Therefore, the GAN training process is quite unstable. There are lots of solutions suggested for stability problems however they mostly solve one and add another or require some specific conditions to be satisfied. In short, improving GAN’s training progress is still an open problem and the research community around it is very active.

Conclusion: The Tip of the Iceberg

Let’s get back to our expensive machine in the beginning. That machine symbolizes all expensive data generation processes in terms of time and resources. Assume we have a mid-size dataset of people’s faces and for an application, we need a bigger dataset. We can pick up our camera and taking pictures of people and add them to the dataset. However, this is a time-consuming process. Now, if we train a GAN on the available images, we can generate hundreds of images in a matter of seconds. Hence, data augmentation is one of the most prominent applications of GANs.

Data scarcity is not the only motivation. Let’s get back to the people’s faces dataset. If we want to use those photos, we will most probably run into a problem regarding privacy concerns. But what if we use fake images of people who do not actually exist? perfectly fine! No one will be concerned. Thus, the GANs provide a neat solution for privacy concerns around the data. GAN’s research community is very active right now and every day a new application or improvement is proposed. Still, there is a lot that remained to be discovered. This is just the tip of the iceberg.


Related Articles