A Neural Algorithm of Artistic Style: Summary and Implementation

Style Transfer using Pytorch

Ching (Chingis)
Towards Data Science
4 min readMar 15, 2021

--

Neural-style, or Neural-Transfer, allows reproducing a given image with a new artistic style. Here I introduce the Neural-Style algorithm proposed by Leon A. Gatys, Alexander S. Ecker and Matthias Bethge. The algorithm receives a style-image, a content-image and an input image, which can be either an empty white image or a copy of a content-image. Thus, it changes the input image to make it resemble the content of the latter one and the style of the former one. Before I start I would like to thank Alexis Jacq for his article.

Principle

Although the name might make you think that it is indeed transferring the style into another image, the idea is to generate an image that has the minimal distance between both content and style images. Thus, it’s even possible to start with a completely white image and achieve the desired outcome, which is proposed in this paper. Regarding technical details, the overall idea is to use a pre-trained network and minimize these distances via backpropagation.

Details

Convolutional Neural Network (CNN) taken from this paper.

As was mentioned above the algorithm expects three images and the goal is to generate an image that has the minimal distance between both content and style images. In order to achieve this, the authors use a pre-trained VGG-19 network and calculate the distance between feature maps of the input image and target images (content and style images). Thus, they calculate both content and style losses at specific layers of CNN.

Content Loss

Content Loss taken from this paper.

The authors use squared-error loss in order to measure a distance between feature representations of the content image and the input image, which are denoted as P and F respectively.

Style Loss

In order to calculate the style loss, the authors use Gram matrices. So, we take all the C feature vectors by flattening the input given in the form of [N, C, W, H] resulting in it having a shape of [N x C, W x H]. Finally, we multiply this matrix with the transpose of itself to get a Gram matrix, which gives us feature correlations. Finally, the style loss is given by the squared-error loss between style and input gram matrices.

Total Loss

Total Loss taken from this paper.

The total loss is given as the weighted sum of the two losses. The authors mention that the default ratio α/β was either 1×10^−3 or 1 × 10^−4.

Model and Normalization

We will be importing the VGG-19 pre-trained network, hence we also need to normalize our images by subtracting the mean and dividing by the standard deviation.

Finally, we will be computing our losses at specific depth layers of the VGG-19. Additionally, the authors mention that replacing Max pooling layers with the Average one results in smoother images. Therefore, we need to have a new Sequential module that will contain all of this.

Training details

Unlike training a network, we want to optimize an input image. Therefore, we are using the L-BFGS algorithm to run our gradient descent and pass our image to it as the tensor to optimize. The ratio α/β is 1×10^−6.

Results

The figure above shows the content and the style images as well as the generated input image. Additionally, you can see how our input image was changing throughout gradient descent.

Astana, Kazakhstan.

Finally, I would like to show a couple more images I liked :).

Almaty, Kazakhstan.
Moscow, Russia.

Some Last Words

I presented a simple yet interesting algorithm that is based on CNNs. However, there are already more approaches that accomplish this task differently, especially in real-time. If you are interested, you are encouraged to study this area further. The full code can be found on my GitHub.

Paper

A Neural Algorithm of Artistic Style

Related Articles

--

--

I am a passionate student. I enjoy studying and sharing my knowledge. Follow me/Connect with me and join my journey.