The world’s leading publication for data science, AI, and ML professionals.

How Artists Are Redefining Art with Generative AI

AI will impact art the same way photography did

All images in this article are AI-generated images. Prompts: The painter/A photograph/The future of art - made on Dream
All images in this article are AI-generated images. Prompts: The painter/A photograph/The future of art – made on Dream

"You don’t take a photograph, you make it."

  • Ansel Adams

Photography wasn’t considered an Art in the beginning. It was promptly embraced by the general public but was perceived as a mechanistic way to record the fleeting moments of life. It was a skill, a matter of technical ability, not an art.

Yet it changed our relationship with art as we knew it. It liberated painters from the task of capturing reality as it is. It allowed them to move, free of ties, in the vast latent space that art offers. They could focus on the abstract, on painting ideas to transmit feelings that photos fail to catch. They could break the rules and explore the limits of what’s possible, finding meaning in questions that only exist within us.

Photography also democratized the access to paintings – as taking a snapshot of reality was suddenly a matter of minutes and not days. At the same time, at least in the view of the elites, it cheapened the unique talent-based beauty of hand-made paintings. Not everyone was happy, but technology always finds a way. Photography came to stay and most will now agree with Adams’ quote above: Photography is art.

Now is the turn for Artificial Intelligence to disrupt the world of visual art. AI art generators will also push artists to redefine their relationship with art. It’s not a matter of competing but embracing. Human imagination combined with the possibilities that AI offers will bring the most unique art we have yet to see.

From perceiving the world to creating art

Computer vision was the most popular branch of machine learning long before the deep learning explosion in the early 2010s. Now language has taken over, but AI systems never stopped improving the way they see and perceive the world. Not long ago, neural nets could barely classify simple objects. Today, they’re giving a new meaning to the idiom "an image is worth a thousand words."

Art generation has advanced throughout the last decade with the help of generative models, particularly GANs (Generative Adversarial Networks). These networks can create images that resemble those in the datasets they’ve been trained on. Researchers and artists realized they could exploit GANs’ capabilities to generate novel imagery. They improved the architectures in search of new forms of AI-generated art, but couldn’t quite find a way to condition the final output of the models.

Art can’t be art if there’s no artist.

In 2021, OpenAI introduced the CLIP model, a neural network trained on 400 million image/text pairs. CLIP can select which text description, taken from a pool of descriptions, best represents a given image. OpenAI released the weights of the model, giving artists the missing piece of a puzzle they’d soon complete.

AI researchers and artists Ryan Murdock and Katherine Crowson realized they could use CLIP as a "steering wheel" to guide generative networks to create images to fit a given text description. Murdock combined CLIP with BigGAN (see the Big Sleep notebook) with results like these:

Prompts: "A sunset" and "When the wind blows." Credit: Ryan Murdock
Prompts: "A sunset" and "When the wind blows." Credit: Ryan Murdock

By inputting a short sentence into the model he could condition it to find, in the vast latent space of image possibilities, which could best represent its visual meaning. "A sunset," or "when the wind blows" were some of the most popular outputs. You can feel the wind moving the trees (or veils?) in the second image. It was a breakthrough.

Crowson, building upon Murdock’s work, authored the notebook that combined VQ-GAN – a more powerful generative architecture published in 2020 that makes use of convolutions and transformers – with CLIP. In contrast with the BigGAN+CLIP model, this one creates images that have texture, an almost tangible concreteness to them.

Prompts: "The Yellow Smoke That Rubs Its Muzzle On The Window-Panes" and "A tree with weaping branches." Credit: Katherine Crowson
Prompts: "The Yellow Smoke That Rubs Its Muzzle On The Window-Panes" and "A tree with weaping branches." Credit: Katherine Crowson

"A tree with weaping branches," Crowson asked the model. She explained later that she intended to write "weeping" instead. However, the model couldn’t know that and came up with an amazing visual fusion of "weaping" – from which the model most likely took the root "weap" and related it to "weapon" – and branches. These images touch the limits of imagination.

Murdock and Crowson came up with the combination of GAN models and CLIP as an attempt to hack their way to DALL·E (the superb text-to-image multimodal AI that OpenAI didn’t want to open-source). But they couldn’t achieve the astonishing precision of DALL·E’s outputs. Instead, VQGAN+CLIP systems generate images of very diverse – almost "alien" – nature, as UC Berkeley student Charlie Snell calls it.

Midway between the unoriginal outputs of standalone generative networks and the literal text-image pairings of DALL·E, VQGAN+CLIP found its art.

As soon as the AI community knew this was possible, an "emerging art scene" exploded. People started experimenting with other GAN architectures and variations of existing ones. It wasn’t long before people realized that the prompt (as they call the text description) mattered when conditioning the final output. As with GPT-3, what you write to the model defines how well it will "understand" what you want. In an analogous display of prompt engineering skills, people began to design styles for their art generations.

Here’s what Dream generates when prompted with "a sunset," adding "baroque," "cyberpunk," and "unreal engine" to each prompt:

Prompts: A sunset + baroque/cyberpunk/unreal engine - made on Dream
Prompts: A sunset + baroque/cyberpunk/unreal engine – made on Dream

One of CLIP’s key strengths is that it has an impressive zero-shot setting performance. This means it doesn’t need to see examples of the text-image pairs you want to know how to do its job. Simply writing a sentence will give you amazingly accurate results.

Combining its zero-shot skill, prompt engineering, and the unlimited possibilities of VQGAN+CLIP, AI-generated art tools flooded the internet. One particular example that has attracted a lot of attention is Dream, a web browser app by Wombo that allows users to create infinitely many images from text descriptions with up to 20 different styles.

Here’s what Dream gives when prompted with "forest" (dark fantasy), "wooden automata" (steampunk), and "happy life" (mystical):

Prompts: Forest (dark fantasy), Wooden automata (steampunk), Life (mystical) - made on Dream
Prompts: Forest (dark fantasy), Wooden automata (steampunk), Life (mystical) – made on Dream

All the images you’ve seen here reflect in some sense how CLIP "sees" the world, and how it "thinks" language represents our visual world. Yet, CLIP doesn’t think anything for itself, it’s been fed tons of internet data (like the GANs it works with). As a metaphor, the VQGAN+CLIP ensemble could be interpreted as an alien that came to Earth, saw and memorized the internet and only the internet, and then used its language-vision mental representations to paint these unique pictures.

The result is, without a doubt, art.

The future of art

Photography didn’t replace painters and artists. It made them move and adapt. It created a new generation of artists that would unravel the potential of the new invention. Painters and photographers now co-live and create art that makes us wonder about the beauty of both paintings and pictures alike.

Art, whatever its form, makes us feel. And there’s plenty of space for new art facets that will make us experience new sensations. AI generative artists – or however they’ll come to be called – will fill some of that space. They’ll displace already existing forms of art slightly towards new creative imaginaries but won’t make any disappear. Many artists will make good use of these new tools (or should I call them assistants?) and empower their creative genius beyond what they think is possible.

AI models like DALL·E or VQGAN+CLIP will continue to evolve into highly sophisticated art engines. I don’t know what lies ahead, but I’m sure AI won’t – like photography didn’t – remove our unique capacity to experience the rich spectrum of emotions and sensations that art canalizes and that makes life worth living.


If you’ve read this far, consider subscribing to my free biweekly newsletter Minds of Tomorrow! News, research, and insights on AI and Technology every two weeks!

You can also support my work directly and get unlimited access by becoming a Medium member using my referral link here! 🙂


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.