The world’s leading publication for data science, AI, and ML professionals.

[Deep learning] Introduction of Generative Adversarial Networks (GANs)

Generative adversarial networks (GANs), formed in 2014 [1], is a state of the art deep neural network with many applications. Unlike the…

Intuition and objective function of GANs

Generative adversarial networks (Gans), formed in 2014 [1], is a state of the art deep neural network with many applications. Unlike the traditional machine learning in unsupervised learning (it does not require a targeted label), GANs is a generative model which generates new content by given data.

It is interesting to see the generated images of MNIST (handwritten digit database) first by GANs at the right side from its original paper:

Generated images by GANs of MNIST from the original paper
Generated images by GANs of MNIST from the original paper

Intuition

The analogy of GANs is known as a fake-currency detection game between a counterfeiter and police [1]. According to the tutorial of GANs by Goodfellow [2], GANs consists of two characters, namely, the generator (counterfeiter) and the discriminator (police).

The counterfeiter tries to produce fake money and deceive the police (discriminator) by looking at the real banknote. The work of the discriminator is to distinguish whether the given money is real or not. Firstly, the money generated by the counterfeiter is too coarse that can be distinguished easily. Based on these failures, now the counterfeiter tries hard to produce more sophisticated money. At the same time, the police are now more experienced to distinguish true or fake money.

As repeating this process many times, both parties learn from the opposite party (that is the reason it is called as "adversarial") and therefore become mature and sophisticated enough. Now, the fake money produced by the counterfeiter is very realistic to other people except the police.

The analogy of the game between counterfeiter and police. Image by the author.
The analogy of the game between counterfeiter and police. Image by the author.

The objective function for GANs

Recall that an objective function is a function to be optimized during training (usually to maximize it, if we want to minimize it, usually it is called as loss function). There are several methods to find the optimal point of the objective function, such as Maximum Likelihood Estimation (MLE) and different ways of gradient descent etc.

GANs consists of two deep neural networks, the generator network (denoted as G) and the discriminator network (denoted as D). The objective of G is to output the fake data G(z) to fool D from the distribution p_data. Oh the other hand, the Discriminator network D, outputs a probability of a real data, where its objective is to maximize the probability of real label and minimize the probability of fake label (with distribution p_z).

During train, G and D update their parameters g and d based on the below min-max objective function V (G;D):

Equation 1. Image by the author
Equation 1. Image by the author

The discriminator should give a high value to the true image (D_θd(x)), which is same as taking logistic function for it ((logD_θd(x)))). It should also give a low value to fake image (G_θg(z)), which is the same as by subtracting it and taking the logistic function (log(1- D_θd(x))). In short, it can be achieved by maximizing both parts by the above objective function V(G;D) for d.

For the generator, it is desirable to fool the generator, which can be achieved if D_θd(x) is close to 1, which is same as minimizing log(1- D_θd(x)). There is a difference for the generator, that is to say, the first part in the min-max function V(G;D) does not depend on the generator. (In the analogy of the game between counterfeiter and police, whether police can recognize the real money has nothing to do with the counterfeiter.) In short, it is same as minimizing the objective function V(G;D) for θg.


Training GANs

Instead of the original objective function, there is a modification of the generator part for practical reason in training:

Equation 2. Image by the author
Equation 2. Image by the author

The reason is that at the beginning of training, the generated fake images are too obvious (i.e. D_θd(x) is close to zero) and therefore the loss is almost zero and with zero gradients.

For the new objective function, the gradient is higher than the original one when D_θd(x) is small:

Equation 3. Image by the author
Equation 3. Image by the author

but for the one gradient:

Equation 4. Image by the author
Equation 4. Image by the author

The denominator in equation 3 is much small than the denominator in equation 4 when D_θd(x) is small.

The original paper states the optimal point of discriminator D and the training procedure, which are skipped here to avoid the mathematical details and direct copying, but providing the informal idea for illustration here:

  • The training takes place in minibatch of noise samples and real examples. The objective of minibatch is to accelerate the training process since training a DNN by all data is extremely heavy.
  • The original stochastic gradient descent of generator is usually replaced by stochastic gradient ascend of the equation 2.

References:

  1. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.
  2. Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016.

Related Articles