The world’s leading publication for data science, AI, and ML professionals.

GANscapes: Using AI to Create New Impressionist Paintings

How I trained StyleGAN2 ADA with 5,000 Impressionist landscape paintings in the public domain

Hands-on Tutorials

GANscapes Sample Output, Images by Author
GANscapes Sample Output, Images by Author

This is my third article on experimenting with Generative Adversarial Networks (GANs) to create fine art. The first two articles focused on creating abstract art by using image augmentation, but this one focuses on creating Impressionist landscape paintings.

I posted all of the source code for GANscapes on GitHub and posted the original paintings on Kaggle. You can create new landscape paintings using the Google Colab here.

Prior Work

There have been several projects and papers that show how to use GANs to create landscape paintings. There’s Drew Flaherty’s master’s thesis from the Queensland University of Technology entitled "Artistic approaches to Machine Learning," where he used the original StyleGAN [1]. Alice Xue created SAPGAN described in her paper, "End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks" [2]. And Bingchen Liu, et al. created a Lightweight GAN in their paper, "Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis" [3].

Below is a sample of paintings from StyleGAN, SAPGAN, Lightweight GAN, and GANscapes.

Sample of GAN Landscapes, from the top, StyleGAN trained by Drew Flaherty, SAPGAN by Alice Xue, Lightweight GAN by Bingchen Liu, et al., and GANscapes, by Author
Sample of GAN Landscapes, from the top, StyleGAN trained by Drew Flaherty, SAPGAN by Alice Xue, Lightweight GAN by Bingchen Liu, et al., and GANscapes, by Author

GANscapes Overview

GANscapes is a system for creating new Impressionist landscape Paintings using the latest advances in AI. Here’s a diagram that shows the main system components.

GANscapes System Components, Diagram by Author
GANscapes System Components, Diagram by Author

Overview of Components

Here is a brief, high-level overview of the components used in GANscapes. I’ll discuss the details of each component later in the article.

I gathered images of Impressionist landscape paintings from WikiArt.org [4] and processed the images to adjust the aspect ratio. I then used the CLIP model [5] from OpenAI to filter the images to keep the "good ones."

I used these images to train Stylegan2 ADA [6] from NVidia, which has a generator and discriminator network. The generator creates new images starting with random "latent" vectors for form and style and tries to fool the discriminator into thinking the output images are real. Before the real and the generated images are fed into the discriminator, they are modified slightly with the Adaptive Discriminator Augmentation (ADA) module that creates visual diversity in the pictures.

I use CLIP again to filter the output images based on a user-supplied text query. And I use the generator again to create an image with a mix of style and form selected by the user. As the final step, I post-process the final images to adust the aspect ratios.

Be sure to check out the image gallery in the appendix at the end of the article to see more results.

Gathering Images

Claude Monet and his Autumn on the Seine at Argenteuil, Source WikiArt.org
Claude Monet and his Autumn on the Seine at Argenteuil, Source WikiArt.org

I started by scraping landscape paintings from WikiArt.org using a custom python script in the Colab here. The script goes through each artist on the site alphabetically. It checks to see if the artist is tagged as part of the "Impressionism" art movement and if they were born after 1800 and died before 1950. The script then loops through all of the artists’ paintings, looking for ones in the "landscape" genre available in the public domain. The script then downloads each qualified image using the artist and painting name as the filename, i.e., claude-monet_landscape-at-giverny-1.jpg. The system found about 5,300 paintings that matched these criteria.

Here is a sample of Impressionist paintings from WikiArt.org.

Impressionist Landscape Paintings, Source WikiArt.org
Impressionist Landscape Paintings, Source WikiArt.org

Adjusting the Aspect Ratio of the Training Images

You will notice from the sample of paintings above that they have varying aspect ratios, e.g., some of the images are a lot wider than others. Because the GAN systems work more efficiently with perfectly square images, the source images’ aspect ratios need to be adjusted. Three techniques are commonly used to make the adjustments, each with pros and cons. I’ll use Landscape at Valery-sur-Somme by Degas __ as an example to show the different methods.

Letterbox, Center Cut, and Squeezed Formats, Image __ by Degas, Formatted by the Author
Letterbox, Center Cut, and Squeezed Formats, Image __ by Degas, Formatted by the Author

The "letterbox" format in the first image above keeps the image uncut and unsqueezed, but black bars are added above and below to make the image have a square shape. The issue with the letterbox format is that the entire image is effectively resized down, losing resolution, and the black parts are "wasted" in the GAN training.

The second image is in the "center cut" format, which is cropped to keep just the image’s square center. The issue with the center cut format is the significant loss of imagery on the left and the right.

The third image is squeezed horizontally to be square. The issue with the squeezed format is that the objects in the painting are distorted. For example, in this image, the windmill just got a lot thinner. And each painting will be squeezed by a different amount, depending on the original aspect ratio.

For this project, I came up with a hybrid technique that seems to be a good compromise.

1.27:1 Rectangle Cut and 1.27:1 Squeezed Formats, Image __ by Degas, Formatted by the Author
1.27:1 Rectangle Cut and 1.27:1 Squeezed Formats, Image __ by Degas, Formatted by the Author

My hybrid solution requires first determining the average aspect ratio of all the original paintings, which for the dataset is 1.27:1. Next, I crop each image into a 1.27:1 aspect ratio, pulling out a center rectangle. I then squeeze the cropped images horizontally into a square format with a resolution of 1024 x 1024 pixels and use the resulting images for my training dataset. When the GAN produces square output images, I scale them back out to be 1.27:1 to match the original format. The result of this process will be synthetically generated images that are free of distortion.

This "center rectangle" format seems to be a good compromise between the center cut and squeezed formats, especially since the original landscape paintings are in, well, landscape format. However, this method would not work well with a mix of landscape and portrait-formatted images as it would tend towards the center cut format. The source code is in the Colab here.

The main showroom of the Durand-Ruel Gallery, etching by Charles-François Daubigny, Source https://www.pubhist.com/w30198
The main showroom of the Durand-Ruel Gallery, etching by Charles-François Daubigny, Source https://www.pubhist.com/w30198

Using CLIP to Judge the Training Images

Just because a painting is tagged on WikiArt as being a landscape from an Impressionist painter doesn’t mean that it’s a good representation of such a painting. In my previous project for generating abstract art with a GAN, I "hand chose" the source images, which was quite time-consuming (and made me feel a little "judgy"). For this project, I used a new open-source AI system called CLIP from OpenAI [5] to do the image filtering.

OpenAI designed two models, an image encoder and a text encoder. They trained the two models on a dataset of images with corresponding phrases. The goal of the models is to have the encoded images match the encoded phrases.

Once trained, the image encoder system converts images to embeddings, lists of 512 floating-point numbers that capture the images’ general features. The text encoder converts a text phrase to similar a embedding that can be compared to image embeddings for a semantic search.

For GANscapes, I compare the embedding from the phrase "impressionist landscape painting" to the embeddings from the 5,300 paintings to find the images that best match the phrase. The source code is in the Colab here.

Below are the top 24 images that match the phrase "impressionist landscape painting," according to CLIP.

Top Impressionist Landscape Paintings according to CLIP, Image by Author, Source WikiArt.org
Top Impressionist Landscape Paintings according to CLIP, Image by Author, Source WikiArt.org

Not bad! There seem to be many depictions of trees and water, but they vary in painting style. And below are the bottom 24 images according to CLIP.

Bottom Impressionist Landscape Paintings according to CLIP, Image by Author, Source WikiArt.org
Bottom Impressionist Landscape Paintings according to CLIP, Image by Author, Source WikiArt.org

OK, I can see why CLIP doesn’t think these images match the phrase "impressionist landscape painting." Except for the two grayscale images on the left, most of them are of people, buildings, abstractions, etc.

I took a look at the images near the 5,000 mark, and they seemed to be decent. So of the 5,314 images, I moved the bottom 314 into a folder called "Salon des Refusés" and kept them out of the training.

Generative Adversarial Networks

In 2014, Ian Goodfellow and his coauthors at the Université de Montréal presented a paper on GANs [7]. **** They came up with a way to train two Artificial Neural Networks (ANNs) that compete with each other to create realistic images. As I explained at the beginning of this article, the first ANN is called the generator, and the second is called the discriminator. The generator tries to create realistic output. The discriminator tries to discern real images from the training set from the fake images from the generator. During the training, both ANNs gradually improve, and the results are surprisingly good.

Last year, I created a project called MachineRay that uses Nvidia’s StyleGAN2 to create abstract artwork based on 20th-century paintings in the public domain. Since then, Nvidia has released a new version of their AI model, StyleGAN2 ADA, designed to yield better results when generating images from a limited dataset.

One of the significant improvements in StyleGAN2 ADA is dynamically changing the amount of image augmentation during training. I wrote about this improvement back in January 2020.

Creating Abstract Art with StyleGAN2 ADA

Training StyleGAN2 ADA

I trained the GAN on Google Colab Pro for about three weeks. Because the Colab times out every 24 hours, I kept the results on my Google Drive and pick up where it left off. Here is the command I used to kick off the training:

!python stylegan2-ada/train.py --aug=ada --mirror=1
--metrics=none --snap=1 --gpus=1 
--data=/content/drive/MyDrive/GANscapes/dataset_1024 
--outdir=/content/drive/MyDrive/GANscapes/models_1024

I noticed an issue with the ADA variable p, which determines how much image augmentation is used during training. Because the p value always starts at zero, it takes the system a while each day to climb back up to around 0.2. I fixed this in my fork of StyleGAN2 ADA to allow p to be set when the augmentation is set to ADA (the implementation in NVidia’s repo will throw an error if p is set when using ADA.)

Here’s the command I used on subsequent restarts.

!python stylegan2-ada/train.py --aug=ada --p 0.186 --mirror=1 
--metrics=none --snap=1 --gpus=1 
--data=/content/drive/MyDrive/GANscapes/dataset_1024 
--outdir=/content/drive/MyDrive/GANscapes/models_1024 
--resume=/content/drive/MyDrive/GANscapes/models_1024/00020-dataset_1024-mirror-auto1-ada-p0.183-resumecustom/network-snapshot-000396.pkl

I replaced 0.186 with the last used p value and the resume path to the previously saved model.

Here are some samples from the trained GANscapes system.

Nice! Some of the generated paintings look more abstract than others, but overall, the quality seems pretty good. The system tends to depict natural items well, like trees, water, clouds, etc. However, it seems to be struggling a bit with buildings, boats, and other human-made objects.

Using CLIP to Filter the Output Images

I put the CLIP model to work again to filter the output images. The system generates 1,000 images and is fed into CLIP to get the image embeddings. The user can type in a prompt, like "Impressionist painting of autumn woods," which gets transformed into a CLIP text embedding. The system compares the image embeddings to the text embedding and finds the top matches. Here are the top 6 paintings that match the "autumn woods" prompt.

GANscapes Autumn Woods Series, Images by Author
GANscapes Autumn Woods Series, Images by Author

Not only does the system find suitable matches for the prompt, but the overall quality seems to have improved, too. This is because the CLIP system is effectively rating the paintings by how closely they match the phrase, "Impressionist painting of autumn woods." CLIP will filter out any of the "funky" images.

Style Mixing

Did you ever wonder why NVidia named their model StyleGAN? Why the word "style" in the name? The answer is in the architecture. Not only does the model learn how to create new images based on a set of training images, it learns how to create images with different styles based on the training dataset. And you can effectively copy the "style" from one image and paste it into a second image, creating a third image that retains the form of the first but adopts the style of the second. This is called style mixing.

I built a Google Colab that demonstrates how style mixing looks for GANscapes. It’s based on the style_mixing.py script from NVidia. The Colab renders seven landscapes for their form and three landscapes for their style. It then shows a grid of 21 landscapes that mix each form with each style. Note that the thumbnails are scaled up horizontally to have a 1:127 aspect ratio.

GANscapes Style Mixing, Image by Author
GANscapes Style Mixing, Image by Author

As you can see, the form images are presented at the top from left to right, labeled F1 to F7. The style images are present at the left from top to bottom, labeled S1 to S3. The images in the 7×3 grid are mixtures of the form and style images. You can see how the trees are placed in roughly the same locations within the images for each of the seven forms, and the paintings rendered with different styles use the same color palette and look as if they were painted at a different time in the year.

Post-processing the Images

I perform two post-processing steps, a mild contrast adjustment and image resizing, to restore the original aspect ratio. Here’s the code.

import numpy as np
import PIL
# convert the image to use floating point
img_fp = images[generation_indices[0]].astype(np.float32)
# stretch the red channel by 0.1% at each end
r_min = np.percentile(img_fp[:,:,0:1], 0.1)
r_max = np.percentile(img_fp[:,:,0:1], 99.9)
img_fp[:,:,0:1] = (img_fp[:,:,0:1]-r_min) * 255.0 / (r_max-r_min)
# stretch the green channel by 0.1% at each end
g_min = np.percentile(img_fp[:,:,1:2], 0.1)
g_max = np.percentile(img_fp[:,:,1:2], 99.9)
img_fp[:,:,1:2] = (img_fp[:,:,1:2]-g_min) * 255.0 / (g_max-g_min)
# stretch the blue channel by 0.1% at each end
b_min = np.percentile(img_fp[:,:,2:3], 0.1)
b_max = np.percentile(img_fp[:,:,2:3], 99.9)
img_fp[:,:,2:3] = (img_fp[:,:,2:3]-b_min) * 255.0 / (b_max-b_min)
# convert the image back to integer, after rounding and clipping
img_int = np.clip(np.round(img_fp), 0, 255).astype(np.uint8)
# convert to the image to PIL and resize to fix the aspect ratio
img_pil=PIL.Image.fromarray(img_int)
img_pil=img_pil.resize((1024, int(1024/1.2718)))

The first part of the code converts the image to floating-point, finds the 0.1% min and the 0.99% max of each channel, and scales up the contrast. This is akin to Adjust Levels feature in Photoshop.

The second part of the code converts the image back to use integers and resizes it to have a 1.27 to 1 aspect ratio. Here are three of the style-mixed paintings, before and after post-processing.

GANscapes F4S1, F7S2, F1S3, Images by Author
GANscapes F4S1, F7S2, F1S3, Images by Author

You can click on each image for a closer look.

Discussion

The quality of the landscape paintings created by GANscapes can be attributed to the StyleGAN2 ADA and CLIP models.

By design, StyleGAN2 ADA can be trained on a limited dataset to produce excellent results. Note that GANs produce images that flow continuously from image to image. Unless the input images are labeled to be in categories, there are no hard breaks between the results. If the input latent vectors change a little, the resultant image will change a little. If the vectors change a lot, the image will change a lot. This means that the system is effectively morphing from one scene to all possible neighbors. Sometimes these "in-between" images have odd artifacts, like a tree partially dissolved to a cloud.

This is where CLIP comes in. Because it’s rating images by how much they match the text query, i.e., "Impressionist landscape painting," it tends to pick fully formed images that don’t have odd artifacts. The CLIP model effectively solves the partial morphing problem with unlabeled GANs.

Future Work

There are several different ways to improve and extend this project.

First, the paintings’ quality could be improved using a technique called Learning Transfer [8]. This could be done by first training the GAN on photographs of landscapes and then continue the training on landscape paintings.

I could probably improve the text-to-image generation by running through multiple iterations of StyleGAN → CLIP → StyleGAN → CLIP, etc. This could either be done with a genetic algorithm like in CLIPGLaSS [9] or gradient descent and backpropagation. This is the same method used to train ANNs, as described in Victor Perez’s article on Medium [10].

Finally, the recent press about how Non-Fungible Tokens (NFTs) can be used to verify the owner of digital files made me think of a possible NFT-GAN hybrid. Instead of buying ownership of a single JPEG file, a prospective collector could bid on a range of latent vectors that generate variations of images and/or styles in a trained GAN. Hey, you read it here first, folks!

Source Code

The 5,000 abstract paintings I collected can be found on Kaggle here. All source code for this project is available on GitHub. Images of the paintings on Kaggle and the source code are released under the CC BY-SA license.

Attribution-ShareAlike
Attribution-ShareAlike

Acknowledgments

I want to thank Jennifer Lim and Oliver Strimpel for their help with this article.

References

[1] D. Flaherty, "Artistic approaches to machine learning," May 22, 2020, Queensland University of Technology, Masters Thesis, https://eprints.qut.edu.au/200191/1/Drew_Flaherty_Thesis.pdf

[2] A. Xu, "End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks", November 11, 2020, https://openaccess.thecvf.com/content/WACV2021/papers/Xue_End-to-End_Chinese_Landscape_Painting_Creation_Using_Generative_Adversarial_Networks_WACV_2021_paper.pdf

[3] B. Liu, Y Zhu, K. Song, A. Elgammal, "Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis," January 12, 2021, https://arxiv.org/pdf/2101.04775.pdf

[4] WikiArt, December 26, 2008, https://www.wikiart.org

[5] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., "Learning Transferable Visual Models From Natural Language Supervision," January 5, 2021, https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf

[6] T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila, "Training Generative Adversarial Networks with Limited Data.", October 7, 2020, https://arxiv.org/pdf/2006.06676.pdf

[7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair, A. Courville, Y. Bengio, "Generative Adversarial Networks," June 10, 2014, https://arxiv.org/pdf/1406.2661.pdf

[8] S. Bozinovskim, and A. Fulgosi, "The influence of pattern similarity and transfer learning upon training of a base perceptron B2." Proceedings of Symposium Informatica, 3–121–5, 1976

[9] F. Galatolo, M.G.C.A. Cimino, and G. Vaglini, "Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search," February 26, 2021, https://arxiv.org/pdf/2102.01645.pdf

[10] V. Perez, "Generating Images from Prompts using CLIP and StyleGAN," Feb 6, 2021, https://towardsdatascience.com/generating-images-from-prompts-using-clip-and-stylegan-1f9ed495ddda

Appendix – Gallery of GANscapes

Here is a collection of finished paintings. Note that you can click to zoom in on any of the images.


Related Articles