Review: ARCNN — Artifacts Reduction CNN (Codec Filtering)

Sik-Ho Tsang

Follow

Published in

Towards Data Science

5 min readSep 30, 2018

--

In this story, Artifacts Reduction CNN (ARCNN) is reviewed. ARCNN is used to reduce the following image artifacts:

Blocking Artifacts: JPEG image is compressed by 8×8 non-overlapping blocks. Blocking artifacts are the discontinuities along the block boundaries of 8×8 blocks
Ringing Artifacts along the sharp edges: To efficiently compress an image, quantization of high frequency components is done to remove some high frequency signals from the images. However, when the edges are sharp, there are ringing artifacts like a wave near the sharp edges when quantization is too strong.
Blurring: Loss of high frequency components also introduces blurring. These artifacts affect other routines such as super-resolution and edge detection.

**Original JPEG (left) JPEG After ARCNN (Right)**

ARCNN has been published in 2015 ICCV and a modified fast ARCNN is published at arXiv in 2016. Since ARCNN is built based on SRCNN, and SRCNN has a shallow CNN architecture, and ARCNN involves the transfer learning concept, it is a good start to learn about CNN. (Sik-Ho Tsang @ Medium)

What Are Covered

Quick Review of SRCNN
ARCNN
ARCNN — Easy-To-Hard Transfer
Fast ARCNN

1. Quick Review of SRCNN

**Feed Forward Functions (Left) Loss Function (Right)**

The above figure is SRCNN architecture. The image goes through 9×9 then 1×1 then 5×5 convs to get the super resolution of the input image.

It is noted that 1×1 conv is used in Network In Network (NIN). In NIN, 1×1 conv is suggested to introduce more non-linearlity to improve the accuracy. It is also suggested in GoogLeNet [4] for reducing the number of connections.

The loss function is just the error between input image and super-resolution output image.

SRCNN has only 3 conv layers. It is one of the papers to start with for learning about the deep learning.

(If interested, please visit my review on SRCNN.)

2. ARCNN

Compared with SRCNN, ARCNN has one more layer with a 7×7 filter.

**JPEG Images compressed with Quality Factor of 10**

With original JPEG: Average PSNR is 27.77 dB.

Using SRCNN (9–1–5): 28.91 dB, that means the image quality is improved.

Using Deeper SRCNN (9–1–1–5): 28.92 dB, one more layer with a 1×1 filter doesn’t help much.

Using ARCNN (9–7–1–5): 28.98 dB is obtained.

**Average PSNR along the number of backprops**

3. ARCNN — Easy-to-Hard Transfer

3.1 Transfer from Shallower to Deeper

By learning the ARCNN (9–7–1–5) first, then keep the first two layers.
and learn the 3rd to 5th layers at ARCNN (9–7–3–1–5).

Since the first two layers have been learnt, it is much better than random initialization, which as shown below:

**Average PSNR along the number of backprops (He [9] is one kind of random initialization)**

3.2 Transfer from higher to lower quality

Similarly, use higher quality samples to train first.
Then transfer the first layer or first 2 layers.

3.3 Transfer from standard to real case

In Twitter, a 3264×2448 image would be re-scaled and compressed as a 600×450 image. Thus,

Learn the network using standard images, transfer the first layer.
Then use 40 Twitter photos for training (335209 samples).

4. Fast ARCNN

4.1 Layer Decomposition

**one more layer with a 1×1 filter is added for Fast ARCNN, (number of filters(filter size))**

Total number of parameters can be reduced by adding 1×1 convolution in between 2 spatial convolutions.

**N: Total Number of Parameters of a Model**

ARCNN has 100,352 parameters at 2nd layer and 106448 parameters in total
Fast ARCNN only has 51200 parameters at 2nd and 3rd layers, and 57296 parameters in total only !!!

Using 1×1 convolution to reduce the model size actually has been proposed in GoogLeNet.