Transforming Real Photos Into Master Artworks with GANs
--
Creativity: Uniquely Human
Art — the ability to create something original, to use one’s unbounded creativity and imagination — it’s something that we humans like to believe is unique to us. After all, no other animal or computer thus far has come close to matching the artistic skill of humans when it comes to realistic paintings.
Even with the recent advances in AI, computers still struggled with being creative. They were good at computing, classifying, and performing programmed tasks, but they could never really match the level of creativity humans had. Human creativity was assuredly unique and one of its kind… That is until generative adversarial networks were conceived. In 2014, the original paper on GANs proposed a new system of estimating generative models — models which could create— using two models.
An Eternal Game of Cat and Mouse
A GAN consists of two models, a generative model G and a discriminative model D as well as a real data set. You can think of G as a counterfeiter, trying to make money that is more and more like real money and D as a cop, trying to differentiate whether the money is real or counterfeit. The real data set acts as a reference for the cop to compare with the output of the counterfeiter.
In the beginning, the counterfeiter is going to suck, because he has no idea how to actually make the money look like real money, but after every instance of the cop determining the fake money, he gets better and learns from his mistakes. Keep in mind too, the discriminator is also getting incrementally better at differentiating real and fake currency — he’s becoming a better cop. This cycle continues over and over until the generator is so good at creating data that looks like the training data even the discriminator can’t tell.
Image-Image Translation with Cycle GANs
The classic GAN architecture is good when it comes to creating new, similar-looking data but it doesn’t work so well when trying to alter an existing image. Moreover, traditional approaches to image-image translation required datasets with paired examples.
These paired examples are data points that are directly related — they show the original image and the desired modification to it. For instance, the training dataset would need to contain the same landscape during winter and summer. However, these datasets are challenging and difficult to prepare — sometimes even impossible, as is the case with art. There just is no photographic equivalent of the Mona Lisa or other great artworks.
To combat this, we use a Cycle GAN. This is a special type of generative adversarial network that has an extension to the GAN architecture: cycle consistency. This is the notion that an image output by the first generator could be used as the input to a second generator and the output of the second generator should match the original image — undoing what the first generator did to the original image.
Think about it like this: You’re using Google Translate to translate something from English to Spanish. You then open a new tab and copy-paste the Spanish back into Google Translate, where it translates it to English. At the end of all t his, you would expect the original sentence again. This is the principle of a Cycle GAN, and it acts as an additional loss to measure the difference between the generated output of the second generator and the original image, without the need for paired examples.
The Cycle GAN Architecture for Style Transfer
Here’s how the Cycle GAN would work if our model was training to create images in the style of French artist Cézanne’s.
Datasets
- Dataset 1: Real photos
- Dataset 2: Artworks by Cézanne
Generative Adversarial Networks
- GAN 1: Translates real photos (dataset 1) into artworks in the style of Cézanne (dataset 2)
- GAN 2: Translates artworks in the style of Cézanne (dataset 2) into real photos (dataset 1)
Forward Cycle Consistency Loss
- Input real photo (collection 1) to GAN 1
- Output photo of Cézanne-styled artwork from GAN 1
- Input photo of Cézanne-styled artwork generated by GAN 1 to GAN 2
- Output real photo from GAN 2
- Compare real photo (collection 1) to real photo outputted from GAN 2 using discriminator
Backward Cycle Consistency Loss
- Input photo of Cézanne artwork (collection 2) to GAN 2
- Output real photo from GAN 2
- Input real photo generated by GAN 2 to GAN 1
- Output Cézanne-styled artwork from GAN 1
- Compare original Cézanne artwork (collection 2) to Cézanne-styled artwork from GAN 1 using discriminator
By minimizing these two losses, our model will eventually learn how to transform real photos into artworks in the style of Cézanne — and since we have not just one but two GANs, we can also turn original artworks from Cézanne into real photos. If we wanted our model to transform our photos into another artist’s style, we would simply need to replace the Cézanne artworks in dataset 2 to artworks from another artist, say Van Gogh or Picasso.
Really — what we’re capable of doing with GANs is just nuts. I mean even for a human, it is quite a daunting task to try and imitate an artist’s style given a photo for inspiration. Some people dedicate their entire lives to this work, yet for a GAN, they can apply any trained style to any picture in minutes. Nuts!
The Results
The Future of Creative Computing
Although calling them masterpieces might be a stretch, there’s no doubt that artificial intelligence is quickly catching up to humans in terms of something we thought was once secure — artistic talent and creativity. I only explained one application of GANs which I find personally to be fascinating, but GANs are now being used in a myriad of ways, from generating realistic faces to increasing the quality of images — and at the heart of it all, just a game between a con and a cop.