Review: ARCNN — Artifacts Reduction CNN (Codec Filtering)

Sik-Ho Tsang
Towards Data Science
5 min readSep 30, 2018

--

In this story, Artifacts Reduction CNN (ARCNN) is reviewed. ARCNN is used to reduce the following image artifacts:

  • Blocking Artifacts: JPEG image is compressed by 8×8 non-overlapping blocks. Blocking artifacts are the discontinuities along the block boundaries of 8×8 blocks
  • Ringing Artifacts along the sharp edges: To efficiently compress an image, quantization of high frequency components is done to remove some high frequency signals from the images. However, when the edges are sharp, there are ringing artifacts like a wave near the sharp edges when quantization is too strong.
  • Blurring: Loss of high frequency components also introduces blurring. These artifacts affect other routines such as super-resolution and edge detection.
Original JPEG (left) JPEG After ARCNN (Right)

ARCNN has been published in 2015 ICCV and a modified fast ARCNN is published at arXiv in 2016. Since ARCNN is built based on SRCNN, and SRCNN has a shallow CNN architecture, and ARCNN involves the transfer learning concept, it is a good start to learn about CNN. (Sik-Ho Tsang @ Medium)

What Are Covered

  1. Quick Review of SRCNN
  2. ARCNN
  3. ARCNN — Easy-To-Hard Transfer
  4. Fast ARCNN

1. Quick Review of SRCNN

SRCNN (9–1–5)
Feed Forward Functions (Left) Loss Function (Right)

The above figure is SRCNN architecture. The image goes through 9×9 then 1×1 then 5×5 convs to get the super resolution of the input image.

It is noted that 1×1 conv is used in Network In Network (NIN). In NIN, 1×1 conv is suggested to introduce more non-linearlity to improve the accuracy. It is also suggested in GoogLeNet [4] for reducing the number of connections.

The loss function is just the error between input image and super-resolution output image.

SRCNN has only 3 conv layers. It is one of the papers to start with for learning about the deep learning.

(If interested, please visit my review on SRCNN.)

2. ARCNN

ARCNN (9–7–1–5)

Compared with SRCNN, ARCNN has one more layer with a 7×7 filter.

JPEG Images compressed with Quality Factor of 10

With original JPEG: Average PSNR is 27.77 dB.

Using SRCNN (9–1–5): 28.91 dB, that means the image quality is improved.

Using Deeper SRCNN (9–1–1–5): 28.92 dB, one more layer with a 1×1 filter doesn’t help much.

Using ARCNN (9–7–1–5): 28.98 dB is obtained.

Average PSNR along the number of backprops
ARCNN has a better visual quality

3. ARCNN — Easy-to-Hard Transfer

3.1 Transfer from Shallower to Deeper

Transfer from Shallower to Deeper
  • By learning the ARCNN (9–7–1–5) first, then keep the first two layers.
  • and learn the 3rd to 5th layers at ARCNN (9–7–3–1–5).

Since the first two layers have been learnt, it is much better than random initialization, which as shown below:

Average PSNR along the number of backprops (He [9] is one kind of random initialization)

3.2 Transfer from higher to lower quality

Transfer from higher to lower quality
  • Similarly, use higher quality samples to train first.
  • Then transfer the first layer or first 2 layers.
Average PSNR along the number of backprops

3.3 Transfer from standard to real case

In Twitter, a 3264×2448 image would be re-scaled and compressed as a 600×450 image. Thus,

  • Learn the network using standard images, transfer the first layer.
  • Then use 40 Twitter photos for training (335209 samples).
Average PSNR along the number of backprops
Twitter Image Visual Quality

4. Fast ARCNN

4.1 Layer Decomposition

one more layer with a 1×1 filter is added for Fast ARCNN, (number of filters(filter size))

Total number of parameters can be reduced by adding 1×1 convolution in between 2 spatial convolutions.

N: Total Number of Parameters of a Model
ARCNN
Fast ARCNN
  • ARCNN has 100,352 parameters at 2nd layer and 106448 parameters in total
  • Fast ARCNN only has 51200 parameters at 2nd and 3rd layers, and 57296 parameters in total only !!!

Using 1×1 convolution to reduce the model size actually has been proposed in GoogLeNet.

4.2 Larger Stride at First Layer and Larger Filter at Last Layer

  • Increase the stride size in the first convolutional layer from 1 to 2.
  • Increase the filter size in the last convolutional layer from 5 to 9.

Number of parameters (N) is still only 56496.

Results
  • ARCNN: 29.13 dB
  • Fast ARCNN (s=2): 29.07 dB which drops only little.

By comparing the speed

  • 0.5 sec per image in ARCNN
  • 0.067 sec per image in Fast ARCNN which is a 7.5 speed up ratio!!

--

--