PGGAN Creates Realistic Faces

New Deep Learning Trend for 2018

Rahul Bhalley
Towards Data Science
3 min readOct 31, 2017

--

Generative Adversarial Nets were invented by Ian Goodfellow back in 2014 at Université de Montréal. They have had a great start showing potential increase in quality of image generation killing all previous benchmarks set by Restricted Boltzmann Machines, Variational Auto Encoders, etc. This framework is still one of the most successful ways to generate better quality images. Moreover, this framework yield networks which are tractable (in contrast to RBMs which are computed by approximate inference) and are pretty easy to train using only error back-propagation.

Over the years many variants of GANs were invented but the first most successful progress was DCGAN which generated images of better quality and also discovered various techniques to stabilize the training process as GANs are famously known to be highly unstable to train as one network becomes stronger than another and learning couldn't be done. Another problem is mode collapse where generator isn't able to generate novel images. Apart from training problems, GANs soon took different directions such as generating high resolution images, image inpainting, generating music, etc.

Progressive GANs

Recently, as of this writing, a research by NVIDIA revealed a new technique to train GANs and they call them Progressive Growing of GANs. They took a purely different and unexpected approach to training. And this technique generates realistic images that are novel and are not easily distinguishable from being original or fake.

Just take a look 👀 at the video below to realize the potential of this new training method!

Celebrity Face Generation (Novel Faces)

Growing the Networks

The approach they used to train the network is they begin by generating 4x4 resolution images from generator G and feed them into discriminator D along with real images having scaled to same resolution for training. Note that the images they use are of 1024x1024 dimensions from CelebA dataset. Now after the network learning saturates for the wide spatial features both the networks are slowly faded in with the higher resolution layers into both G and D.

The researchers used residual networks to fade these higher resolution convolutional layers into both networks by increasing the weight of residual networks’ existence. The convolutional resolutions (new layer) are increased by the factor of 2 which in this case is 8x8 from 4x4. This means now G generates 8x8 image (instead of 4x4 which was the previous case) and is fed into D as 8x8 image. Also the original real images are scaled to 8x8 for feeding them into D. The previous 4x4 convolutional layers of both G and D still remain trainable.

The fading of networks into the G and D with the help of residual network lets higher resolution layers to fade into the network without affecting the well-trained lower resolution layers.

The layers are faded into both G and D symmetrically. The resolutions are increased till 1024x1024 step-by-step by increasing resolutions by factor of 2. This way networks learns the wide spatial features of the images and the local spatial features in higher resolution layers as the GANs grows therefore called Progressive Growing of GANs.

Although some images are not correct such as sometimes the hair gets mixed up with forehead, eyes in some photos are not similar to each other, etc. But overall the network generates pretty good images and that too in HD!

What Next?

For a deeper understanding take a look at the research paper: Progressive Growing of GANs for Improved Quality, Stability, and Variation. Also check my inference (generator) network implementation of Progressive-Growing-of-GANs on GitHub. The code is written in PyTorch and lets you to generate an image from latent space.

Track me on GitHub, Instagram, Twitter.

--

--

Freelance ML +  Developer | Authored Deep Learning with Swift for TensorFlow book | Looking for full-time job: rahulbhalley@icloud.com