Applying GANs to Super Resolution

Connor Shorten
Towards Data Science
3 min readDec 15, 2018

--

SRGAN Results from Ledig et al. [3]

Generative adversarial networks (GANs) have found many applications in Deep Learning. One interesting problem that can be better solved using GANs is super-resolution. Super-resolution is a task concerned with upscaling images from low-resolution sizes such as 90 x 90, into high-resolution sizes such as 360 x 360. In this example, 90 x 90 to 360 x 360 is denoted as an upscaling factor of 4x.

One solution to super-resolution is to train a Deep Convolutional Network that takes in data where the input is a Low-Resolution Patch and the labeled output is a High-Resolution Patch. This is different from many supervised learning problems where the output is either 1 or 0 or a vector of class predictions. In this case, the output is an image. These networks learn a mapping from the low-resolution patch through a series of convolutional, fully-connected, or transposed convolutional layers into the high-resolution patch. For example, this network could take a 30 x 30 low-resolution patch, convolve over it a couple times such that the feature map is something like 22 x 22 x 64, flatten it into a vector, apply a couple of fully-connected layers, reshape it, and finally, upsample it into a 30 x 30 high-resolution patch through transposed convolutional layers.

One problem with doing this is that it is difficult to design an effective loss function. The common loss function used for this is the MSE (Mean Squared Error) between the network output patch and the ground truth high-resolution patch. A solution for this loss function is the use of a perceptual loss function developed by Johnson et al. [1]. This loss function is computed by taking the difference in feature map activations in high layers of a VGG network [2] between the network output patch and the high resolution patch. The feature map activations are thus denoted as a perceptual loss metric.

GANs further improve the development of this loss function as demonstrated by Ledig et al. [3]. In addition to the perceptual, content loss, an adversarial loss is added to further push images towards a natural image manifold. The GAN framework is integrated to denote if the patch created by the generator is similar to a ground truth set of high-resolution patches. The error of the generator network is then calculated through the adversarial loss, as well as the perceptual loss (denoted by the difference in VGG feature map activation from the outputted patch and ground truth patch).

It is very interesting to see how multiple loss functions can be used to optimize neural networks. In this case, each loss function provides a unique perspective on the problem. Thank you for reading, check out the papers below if you are interested in this topic!

References:

[1] Perceptual Losses for Real-Time Style Transfer and Super-Resolution. Justin Johnson, Alexandre Alahi, Li Fei-Fei.

[2]Very Deep Convolutional Networks for Large-Scale Image Recognition. Karen Simonyan, Andrew Zisserman.

[3]Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi.

--

--