The world’s leading publication for data science, AI, and ML professionals.

BIG.art: Using Machine Learning to Create High-Res Fine Art

How to use GLIDE and BSRGAN to create ultra-high-resolution digital paintings with fine details

Sample Results from BIG.art, Images by Author
Sample Results from BIG.art, Images by Author

I have been experimenting and writing about using AI/ML to create Art from text descriptions for over a year now. During this time, I have noticed a significant increase in interest in this area, in part due to the burgeoning NFT art market.

After looking at dozens of ML models for generating art, the best one I have seen so far is GLIDE from OpenAI [1]. Coupled with a super-resolution resize model called BSRGAN from ETH in Zurich [2], I find the results to be excellent.

For example, below are the results from two of my earlier projects, MAGnet using CLIP+SWAGAN and GANshare One using CLIP+VQGAN, compared to results from the new system on the right. The prompts I used were "a painting of rolling farmland," "an abstract painting with orange triangles," and "a still life painting of a bowl of fruit."

Comparison of the Output from the ML Models, Images by Author
Comparison of the Output from the ML Models, Images by Author

Although the assessment of art is inherently subjective, it is clear to me that the results from the new model are better than the previous two. (However, I appreciate the 3D look of the CLIP/VQGAN rendering of the orange triangles.) You can click on each image to get a closer look.

Overview

Here is a high-level block diagram for my project to generate high-resolution fine art called BIG.art. After a summary of the system, I will get into the details of each component further below.

BIG.art Components, Diagram by Author
BIG.art Components, Diagram by Author

OpenAI did the heavy lifting when they collected 250 million text-image pairs and trained two GLIDE models, an image generator and an image upsampler. I passed a text prompt, "still life painting of colorful glass bottles," into the GLIDE generator, and it created a set of seven thumbnail images at 64×64 pixels each. I then sent the generated thumbnails and the prompt into the GLIDE upsampler, and it resized them to 256×256 pixels each. Even the upsampled images were pretty small. If you printed one at 300 DPI, it would be less than an inch across and down. The following steps are to resize the selected image.

After experimenting with several resizing systems, I settled on the BSRGAN super-resolution resizer model from ETH in Zürich. It did a great job resizing the selected image by 4x to 1024×1024 pixels. Although the edges of the resized images were sharp, the fill areas tend to get flattened out. To compensate for this, I added some filtered noise for texture.

I optionally passed the resized and textured image through VQGAN, an image encoder, and decoder from Heidelberg University in Germany. I found that VQGAN will invent new details that often enhance the resized image.

The final steps are another 4x resize from BSRGAN and another pass through the texture generator. The result is a 4096×4096 image with sharp edges and details. Printing this at 300 DPI would yield a piece over one square foot, suitable for framing. Here is the final image with some of the details.

BIG.art Results for "still life painting of colorful glass bottles" with Selected Areas to Show Details, Image by Author
BIG.art Results for "still life painting of colorful glass bottles" with Selected Areas to Show Details, Image by Author

Be sure to check out the appendix below to see some more results from BIG.art. And you can create your own artwork using the Colab here.

Component Details

Generating Images with GLIDE

In March 2022, OpenAI released a series of AI models for image creation called GLIDE, a so-called diffusion model, an alternative to Generative Adversarial Networks (Gans). Here is an explanation of how diffusion models work by two engineers at Google Research.

Diffusion models work by corrupting the training data by progressively adding Gaussian noise, slowly wiping out details in the data until it becomes pure noise, and then training a neural network to reverse this corruption process. Running this reversed corruption process synthesizes data from pure noise by gradually denoising it until a clean sample is produced. This synthesis procedure can be interpreted as an optimization algorithm that follows the gradient of the data density to produce likely samples. – Jonathan Ho and Chitwan Saharia [3]

Diffusion models are basically noise reduction models that have been trained for so long that they generate new images given pure noise as input.

OpenAI’s GLIDE is based on their earlier investigation into using diffusion models for image synthesis. Their 2021 paper, with a bold statement in the title, Diffusion Models Beat GANs on Image Synthesis, showed that diffusion models conditioned on classes of images could achieve better image quality than state-of-the-art generative models [4]. In their latest paper, GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, the authors…

… observe that GLIDE with classifier-free guidance is capable of generalizing to a wide variety of prompts. The model often generates realistic shadows and reflections, as well as high-quality textures. It is also capable of producing illustrations in various styles, such as the style of a particular artist or painting, or in general styles like pixel art. – Alex Nichol et al. [1]

For BIG.art, I use the GLIDE image generator to take in a text prompt and produce a series of seven 64×64 images. The system attempts to depict what is described in the prompt. I then feed the images and the prompt into the GLIDE upsampler that increases the resolution to 256×256. The system was trained to use the prompt to help add detail when resizing up.

For example, GLIDE produces the following seven images from the prompt "a seascape with crashing waves."

Images generated by GLIDE for "a seascape with crashing waves," Images by Author
Images generated by GLIDE for "a seascape with crashing waves," Images by Author

OK, those looked pretty good. Here’s another set for "Boston city skyline."

Images generated by GLIDE for "Boston city skyline," Images by Author
Images generated by GLIDE for "Boston city skyline," Images by Author

These kinda-sorta look like Boston, but not exactly. In any case, I’ll use the fourth one for the resizing discussion below.

Note that OpenAI released trained GLIDE models that cannot create images of people. The authors state…

… releasing our model without safeguards would significantly reduce the skills required to create convincing disinformation or Deepfakes. … In order to mitigate potentially harmful impacts of releasing these models, we filtered training images … containing people to reduce the capabilities of the model in many people-centric problematic use cases. – Alex Nichol et al. [1]

Resizing Images with BSRGAN

There are many different approaches to using AI to resize images to get clean, sharp results. This field of research is called super-resolution imaging.

I tested a half dozen different image super-resolution (ISR) resizing models and found two called Blind Resizing Network (BSRNet) and Blind Resizing Generative Adversarial Network (BSRGAN), that work well for upscaling fine art images. The BSRGAN model uses BSRNet as a baseline, followed by further training using a GAN model.

The authors of the paper, Designing a Practical Degradation Model for Deep Blind Image Super-Resolution, from ETH in Zurich, say the following.

It is widely acknowledged that single image super-resolution (SISR) methods would not perform well if the assumed degradation model deviates from those in real images. Although several degradation models take additional factors into consideration, such as blur, they are still not effective enough to cover the diverse degradations of real images. To address this issue, this paper proposes to design a more complex but practical degradation model that consists of randomly shuffled blur, downsampling and noise degradations. – Kai Zhang et al. [2]

The system was trained to blindly discover the various degradations that produced the low-resolution (LR) image, which informs the AI model when reconstructing the high-resolution image. Here is a comparison of multiple ISR models. The original LR image is on the left, and the images scaled up with BSRNet and BSRGAN are on the right.

Comparison of ISR Methods described in the BSR Paper, Source: Kai Zhang et al.
Comparison of ISR Methods described in the BSR Paper, Source: Kai Zhang et al.

You can see that the images and metrics for BSRNet and BSRGAN look better than the others. The two quality metrics shown are Peak Signal to Noise Ratio (PSNR), where bigger is better, and **** Learned Perceptual Image Patch Similarity LPIPS, where smaller is better. I found that the BSRGAN generally looks sharper, so that’s what I am using for my BIG.art project.

Here is the "Boston city skyline" image from GLIDE scaled up four times with both bicubic interpolation and BSRGAN. Note that you can click on the images to check out the details.

Comparison of Resizing 4x with (left) Bicubic Interpolation and (right) BSRGAN Resize, Images by Author
Comparison of Resizing 4x with (left) Bicubic Interpolation and (right) BSRGAN Resize, Images by Author

You can see that the image resized with BSRGAN is much sharper and vibrant. However, it seems to have an airbrushed quality with a lack of texture in the smooth areas. I will address this in the next section.

Texture Generator

To add some interest to the flat parts of the images, I created a texture generator that runs a field of monochromatic random noise through a blur function. The noise field is then added to the picture. The parameters are:

  1. texture_amount – the amount of noise from 0 to 15%.
  2. texture_size – the size of noise "clumps" from 1 to 9

Here is the original image again with 5% texture set to sizes 1 and 9.

Generated Image with Varying Texture Sizes (left) none, (center) 5% at size 1, and (right) 5% at size 9, Images by Author
Generated Image with Varying Texture Sizes (left) none, (center) 5% at size 1, and (right) 5% at size 9, Images by Author

I find that adding a little texture makes the generated and resized art more aesthetically pleasing. The source code for the texture generator is here.

Using VQGAN to Enhance Details

When I was experimenting with various image generation techniques, I happened to send an image that was resized with BSRGAN through a system called a Vector Quantized Generative Adversarial Network (VQGAN) [5]. I used VQGAN in the past for my last three GAN projects. I typically run VQGAN for 100 to 400 iterations using a text prompt and OpenAI’s CLIP to fine-tune images.

Interestingly, I found that simply encoding and decoding an image with VQGAN improves the details, especially images that were resized up with BSRGAN.

Below is an area of detail from the Boston city skyline image before and after encoding/decoding with VQGAN. For this experiment, I turned off the texture generator.

Detail of the Original Resized Image (left) and the Image Encoded and Decode through VQGAN (right), Images by Author
Detail of the Original Resized Image (left) and the Image Encoded and Decode through VQGAN (right), Images by Author

It’s subtle, but you can see how VQGAN added some details and seemed to finish a construction project in the lower right.

This works because of the way VQGAN was designed and trained. It’s a hybrid Transformer/GAN model that looks at subregions of images and encodes them into the types of regions it has seen before during training. When decoding, it will render the detailed parts seamlessly along with the neighbors.

The BSRGAN model upscaled the 256×256 image to 1024×1024 by predicting what the high-res image would look like. Then the VQGAN model mapped the resulting image with new invented details. You can see a full write-up of VQGAN in my GANshare article.

Using BSRGAN Again for a Final Resize Up

The final step for BIG.art is another 4x resize up with BSRGAN and another texture treatment. Here is the final image at 4096×4096 pixels.

BIG.art Rendering of "Boston city skyline," Image by Author
BIG.art Rendering of "Boston city skyline," Image by Author

You can click on the image to zoom in and see the detail. And be sure to check out the appendix below to see some more generated images.

Results

After experimenting with BIG.art, I found that some prompts work well for generating images, but others do not.

Prompts that Work

Creating abstract paintings seems to work well. Here are some prompts. Note that you can see the resulting images in the appendix.

  • "an abstract painting with colorful circles"
  • "a splatter painting with thin yellow and black lines"
  • "a block color painting with purple and green squares"

Landscape paintings seem to come out well, too.

  • "landscape painting of an Italian villa"
  • "sunset over a lake"
  • "majestic snow-capped mountains"

And paintings of pets.

  • "a painting of corgi"
  • "a tabby cat"
  • "a goldfish in a glass fishbowl"

Prompts that Don’t Work

Prompts that involve people (i.e., "children playing," "Barack Obama," "The Mona Lisa," etc.) do not work because of the intentional lack of people in the training data for GLIDE.

Prompts of abstract concepts (i.e., "freedom," "a brand new day," "haywire," etc.) also do not work because of a lack of consensus on the content of images on the Internet that are labeled using these words.

Here’s a question that defines a rule of thumb for using BIG.art: When you do a Google search of the prompt and look at the resulting images, do they look roughly similar? If so, generating an image using that prompt will generate good images. Oh, and don’t try to render people. It won’t work.

Source Code

The source code for this project is available on GitHub. I am releasing the sources under the CC BY-SA license. You can create your own images using this Google Colab.

Creative Commons Attribution Sharealike
Creative Commons Attribution Sharealike

If you use this code to create new images, please give attribution like this: This image was created with BIG.art **** by Robert A. Gonsalves.

Acknowledgments

I want to thank Jennifer Lim and Oliver Strimpel for their help with this article.

References

[1] A. Nichol et al., GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models (2022)

[2] K. Zhang et al., BSRGAN, Designing a Practical Degradation Model for Deep Blind Image Super-Resolution (2021), Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 4791–4800

[3] J. Ho and C. Saharia, High Fidelity Image Generation Using Diffusion Models (2021)

[4] P. Dhariwal and A. Nichol, Diffusion Models Beat GANs on Image Synthesis (2021)

[5] VQGAN by P. Esser, R. Rombach, and B. Ommer, Taming Transformers for High-Resolution Image Synthesis (2020)

Appendix

Here are examples of the output of BIG.art for the following prompts. These are the images I deemed the best from batches of seven.

Abstract Paintings

an abstract painting with colorful circles

an abstract painting with colorful circles, Image by Author
an abstract painting with colorful circles, Image by Author

a splatter painting with thin yellow and black lines

a splatter painting with thin yellow and black lines, Image by Author
a splatter painting with thin yellow and black lines, Image by Author

a block color painting with purple and green squares

a block color painting with purple and green squares, Image by Author
a block color painting with purple and green squares, Image by Author

Landscape Paintings

landscape painting of an Italian villa

landscape painting of an Italian villa, Image by Author
landscape painting of an Italian villa, Image by Author

sunset over a lake

sunset over a lake, Imae by Author
sunset over a lake, Imae by Author

majestic snow-capped mountains

majestic snow-capped mountains, Image by Author
majestic snow-capped mountains, Image by Author

Pet Paintings

painting of a corgi

a painting of a corgi, Image by Author
a painting of a corgi, Image by Author

a tabby cat

a tabby cat, Image by Author
a tabby cat, Image by Author

a goldfish in a glass bowl

a goldfish in a glass bowl, Image by Author
a goldfish in a glass bowl, Image by Author

Bonus Prompts

Here are some renderings from various prompts suggested by one of my reviewers, Oliver.

a wood workshop

a wood workshop, Image by Author
a wood workshop, Image by Author

microscopic organisms

microscopic organisms, Image by Author
microscopic organisms, Image by Author

traffic jam

traffic jam, Image by Author
traffic jam, Image by Author

Related Articles