Single Image Super-Resolution Challenge

To introduce some novel data augmentation methods on improving the resolution of a single image

6 min readMay 31, 2021

1 Challenge Description

Figure 1. An example of single image super-resolution [Image by author].

The goal of this mini challenge is to increase the resolution of a single image (by four times). The data for this task comes from the DIV2K dataset [1]. For this challenge, we prepared a mini-dataset, which consists of 500 training and 80 validation pairs of images, where the HR images have 2K resolution and the LR images are downsampled four times.

For each LR image, algorithms will increase the resolution of the images. The quality of the output will be evaluated based on the PSNR between the output and HR images. The idea is to allow an algorithm to reveal more details imperceptible in the LR image.

2 Data Pre-processing

The DIV2K dataset [1] consists of 500 training and 80 validation pairs of images, where the HR images have 2K resolution and the LR images are down-sampled four times. Although DIV2K has high resolution images, the training patches are usually small. There is a waste if reading the whole image while only using a very small part of it. In order to accelerate the IO speed during training, the 2K resolution images can be crop to sub-images (480x480 sub-images) by the script extract_subimages.py from the codebase BasicSR [2]

3 Data Augmentation

Joint diverse augmentation methods can be used to partially block or confuse the training signal so that the model acquires more generalization power. The performance can be further boosted by applying several curated data augmentation methods together during the training phase.

3.1 Cutout

Cutout [2] can erase (zero-out) randomly sampled pixels with probability α. Cutout-ed pixels are discarded when calculating loss by masking removed pixels. With default setting, dropping 25% of pixels in a rectangular shape, could degrade the original performance. However, it would give a positive effect when applied with 0.1% ratio and erasing random pixels instead of a rectangular region.

3.2 CutMix and Mixup

CutMix [3] can replace randomly selected square-shape region to sub-patch from another image. Mixup [4] can blend randomly selected two images. CutMix can be joint with the Mixuped image.

3.3 CutBlur

CutBlur [5] cuts a low-resolution patch and pastes it to the corresponding high-resolution image re- gion and vice versa. By having partially LR and partially HR pixel distributions with a random ratio in a single image, CutBlur enjoys the regularization effect by encouraging a model to learn both “how” and “where” to super-resolve the image. By doing so, the model can understand “how much”, instead of blindly learning to apply super- resolution to every given pixel.

3.4 Flip and Rotate

Random horizontal, vertical flip and rotate 90, 180, 270 degrees to increase the amount of data.

3.5 RGB permutation

RGB permutation is to randomly permute RGB channels, which do not incur any structural change in an image.

4 Model Architecture

Figure 2. EDSR architecture [Image by author].

Figure 3. Comparison of residual blocks in original ResNet, SRResNet, and MSRResNet [Image by author].

The backbone of the network is MSRResNet, which is modified from EDSR [6], as shown in Figure 2. It can be seen from Figure 3that compared with ResNet [7], SRResNet [8], batch normalization layers are removed from the network. Since batch normalization layers normalize the features, they get rid of range flexibility from networks by normalizing the features and therefore it is better to remove them. After removing this step, the network can stack more network layers or extract more features from each layer under the same computing problem resources with better performance. The network uses L1 norm’s lost function to optimize the network model. In training, the low-power up sampling model is first trained, and then the parameters of the low-power up sampling model is trained to initialize the high-power up sampling model, which can reduce the training time of the high-power up sampling model and obtain better training results.

The network architecture consists of several residual blocks. The default number of residual blocks is 16 and it is increased to 20 to create a deeper model within the parameter constraint. In each res block, 3x3 convolution layer is followed by BN layer and Relu layer. Leaky Relu is applied after the first convolution layer and connect the last four convolution layer. It introduces a learnable parameter to help it learn some negative coefficients adaptively.

5 Loss function

The loss function is kept using the L1 loss function since it has already been discussed in [8], which proves that L1 loss provides better convergence than L2.

6 Changed Parameters and Hyperparameters

(1) num_block is set from 16 to 20

The block number is increased by 20 to obtain a deeper model. As a result, the model parameter is increased from 1,517,571 to 1,812,995.

(2) total_iter is set one tenth of the original from 1, 000,000 to 100,000

Total iterations for training is reduced for faster training.

(3) learning rate periods are set one tenth of the original from [250000, 250000, 250000, 250000] to [25000, 25000, 25000, 25000]

Cosine annealing is a type of learning rate schedule that has the effect of starting with a large learning rate that is relatively rapidly decreased to a minimum value before being increased rapidly again. Since the total_iter is set one tenth of the original, the cosine annealing periods has to be set to one tenth of the original as well.

7 Result

The PSNR of the model on the validation dataset, and the number of parameters of the model. are shown in Table 1 below.

Table 1. PSNR and Model Parameters [Table by author].

8 Conclusion

To sum up, the model MSRResNet is reproduced and the performance on the PSNR evalution matric is improved by applying joint data augumentation and deeper model.

Reference

[1] E. Agustsson and R. Timofte, NTIRE 2017 Challenge on Single Image Super- Resolution: Dataset and Study, CVPRW 2017

[2] BasicSR: https://github.com/xinntao/BasicSR

[3] Terrance DeVries and Graham W Taylor. Improved regular- ization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017. 1, 2, 3, 9, 10.

[4] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regu- larization strategy to train strong classifiers with localizable features. arXiv preprint arXiv:1905.04899, 2019. 1, 2, 3, 9

[5] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimiza- tion. arXiv preprint arXiv:1710.09412, 2017. 1, 2, 3, 9

[6] Yoo, J., Ahn, N., & Sohn, K. A. (2020). Rethinking data augmentation for image super-resolution: A comprehensive analysis and a new strategy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8375–8384).

[7] Lim, B., Son, S., Kim, H., Nah, S., & Mu Lee, K. (2017). Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 136–144).

[8] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR 2016. 3

[9] C. Ledig, L. Theis, F. Husza ́r, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a gener- ative adversarial network. arXiv:1609.04802, 2016. 1, 2, 3, 4,5,6,7