DALL·E: an AI Treasure Chest in Action

Creative and comprehensible capacities of Artificial Intelligence

Merzmensch

Follow

Published in

Towards Data Science

15 min readApr 27, 2022

--

Image created with DALL·E by OpenAI // Copyright: OpenAI // generated by Author

Update (September 2022): from now, DALL·E is accessible directly, without a waitlist.

The year 2021 began with several AI milestones. OpenAI published two multimodal approaches: DALL·E and CLIP, with a capacity for photorealistic text-to-image transfer (I wrote about this impact).

By using text prompts, DALL·E could create compelling and almost photorealistic images:

The famous Avocado chair (Prompt: ”an armchair in the shape of an avocado”), Image by DALL·E, OpenAI, screenshot by the Author

While DALL·E (text-to-image) was still under internal research of OpenAI, CLIP was provided to the world as open-source. This neural network, which “efficiently learns visual concepts from natural language supervision”, was used by many artists and developers for different visual models. They connected with StyleGAN2, VQGAN, and other approaches, helping to create zero-shot images (decent shout out to Advadnoun, the pioneer in this movement). In this ongoing List on Reddit, you will find more than 70 Google Colab Notebooks (interactive implementations of repositories to run directly in your browser).

The workflow (the text or image input creates a new image) was similar to DALL·E, even with a different approach, and with very different results: not photorealistic, but rather depicting “dreams of machines”, like back to Google Deep Dream, but with entirely new visual motifs:

Or, with VQGAN+CLIP (animated creation):

By author

I won’t focus on Disco Diffusion and Pytti as absolutely stunning CLIP-based implementations (they deserve separated exploration). But still, everybody was thinking back on DALL·E presentation.

DALL·E Desires and Clones

Last year, a DALL·E clone appeared, created in Russia (I explored it here): ruDALLe. Russian researchers tried to recreate the architecture of the OpenAI approach. But since the original Transformer of DALL·E was not accessible, they could achieve just the creation of semi-convincing (even if still interesting) results:

Avocado Armchair by ruDALLe, screenshot by the Author

A critical downside was not only in just semi-realistic images but also in the inability of ruDALLe to recreate metaphoric language. In the case of complex and abstract prompts, like “Nostalgia” or “Memories about previous life”, the ruDALLe recreated book covers (which it was overtrained on).

Prompt: Remembrance of the nostalgia, a surrealist painting by Dalí, created by ruDALLe, screenshot by the Author

In some cases, even you could see what was within the training dataset by ruDALLe:

iStock Watermarks, screenshot by the Author

Yet, this approach was used for Looking Glass by AI_curio, a ruDALLe-based reinterpretation of an image, looking for “the same vibe”. Here are several LookingGlass-variations of my userpic:

On the left: my userpic / on the right: LookingGlass variations of this image

Encounters with the original DALL·E

As you know from our article about Codex, we have been a small team of OpenAI Community Ambassadors since the GPT-3 release: we help users and developers orient in the AI solutions and communicate their needs, and requests to OpenAI. This enables us to experience the novel OpenAI approaches, which are still not publicly available.

As an ambassador, I have had access to the first and then second iterations of DALL·E and could test the initial model.

My first prompt in the first iteration was:

Mona Lisa is drinking wine with da Vinci

Generation of the images took around 60 seconds, and this was the result:

Image, created by the initial DALL-e, photo by author

This small 256x256 image had it all. Instead of depicting the iconic smiling Lady, drinking together with some traditional da Vinci figure — we have here an entire art historian discourse, aesthetically perfected: La Gioconda as a reflection in the raising glass (of Maestro?). A self-portrait?

Another result of my prompt convinced with its emotional overload:

Teddy Bear on the beach in the sunset

Even complex prompts delivered interesting completions:

Remembrance of nostalgia, surrealist painting by Dalí.

**Remembrance of the nostalgia, surrealist painting by Dalí**, photo by author

Also, DALL·E followed my requests directly:

A hammer, a book and a bottle on a wooden table.

My favorite was “Lamp in the shape of snail”:

This first DALL·E was already powerful, fully following Paper, yet still with limitations in size and creative capacity. But the DALL·E team worked hard on developing it — and so …

DALL·E 2 entered the stage.

Finally, in April 2022, DALL·E 2 was presented: working with CLIP and the GLIDE (Guided Language to Image Diffusion for Generation and Editing) this fully renewed version creates stunning results.

I am delighted to finally share with you my observations and insights into working with this system. One of the most essential tasks is: to enhance human-machine creative collaboration.

The first DALL·E implementation had several parameter settings, as we know from GPT-3, like Temperature. The actual DALL·E 2 UI is (still, at the current date) simple: just a line to enter your prompt.

DALL·E interface (screenshot by author, 2022.04.15)

Nevertheless, with these results, you will already be overwhelmed.

The main features of DALL·E 2 are:

high-resolution images (1024x1024)
quick generation: it takes around 30 seconds for a series of 10 images
inpainting function
variations of one image

First of all: Who owns DALL·E generated images?

In the case of GPT-3, the user creating a text is the owner of these particular contents and can use and apply it also for commercial needs.

DALL·E 2 is different, and you will get this message signed for the first time to the system.

So, no NFT here. This is a comprehensive, collaborative research project, and all users working on that project improve it with their prompts. You can use the images for your personal needs; you can use them for non-commercial online publications (as far as they align with the guidelines). You can use them as ice-breakers during writer’s block or as brainstorming for visual or textual storytelling. You can use them as a proof of concept to communicate with your designer what you want to see in a better way.

Indeed, that's for the moment. OpenAI is working on the guidelines and use cases. But for the first — this is a creative community AI experiment.

And for crypto-artists using AI among us: there are so many other approaches, but please always consider disclaimers and TOS of developers if you really may use their solutions for NFT.

Mona Lisa Drinking Wine with Da Vinci

This was my first DALL·E 2 Prompt, and for the second model, I debuted with the same:

Mona Lisa Drinking Wine with Da Vinci

Note the focus on the glass; note the Mona Lisa’s smile. And note the horizontal level of the liquid in the glass. I suppose DALL·E already knows what the glasses (including wine) look like. Even if there are some glitches in the hand holding the wine glass — very convincing.

And here begins my very personal journey. I don’t care about AI, precisely following my instructions to depict

One blue marble, 2 books and a glass with water on the table

Because DALL·E 2 does it perfectly:

*One blue marble, 2 books and a glass with water on the table /* Image created with DALL·E by OpenAI // Copyright: OpenAI // generated by Author

My main focus — and obsession — is the question of how far AI can understand human aesthetics, hidden semantics, and storytelling. If AI can be creative? (Spoiler: yes, it can).

But first, what can DALL·E else do?

Variations.

The model can create variations of the already created image. For my Mona Lisa image above, I made different variations:

Interestingly, if you use inpainting on the initial image (more — below), you will get different glasses, but still with a horizontal liquid level.

But DALL·E can do more.

The following image was created with the prompt:

The truth about the beginning of the world.

And for this, variations are more different:

In the case of Variations, the model applies CLIP to “describe” the initial image and render a series of images according to the image description. We see the globe, the magnifying glass, the maps in all images, just in different compositions. The initial prompt, “The truth about the beginning of the world” isn’t relevant anymore: the actual prompt consists of image prompt + description (which is not visible in the DALL·E UI).

Another variation was created by uploading an image (an experimental function within DALL·E). I used my userpic for the original image:

As you see, DALL·E detected:

a mirror sphere
a human being with a camera
a building, blue sky, and trees in the background
specular reflection

All these elements were reproduced in the Variation series.

Inpainting

Inpainting with textual prompting is already used in GauGAN2 or the ProsePainter (brought by Artbreeders developer). This is a powerful tool: by selecting specific areas of an image and prompting it with textual remarks, you let DALL·E “paint in” your desired motifs into the initial image.

This is possible with the prompt

A punk raising hand with a beer bottle,

applied to the famous painting by Caspar David Friedrich Wanderer above the Sea of Fog (1818)

Left: Wanderer above the Sea of Fog (1818) Public Domain / Right: marked areas

Modified Image created with DALL·E by OpenAI // Copyright: OpenAI // generated by Author

In short, it will transform parts of images in specific desired ways.

Observations

By experimenting with DALL·E, we can observe specific strengths of the generative model. Here are just some of them.

The main power of DALL·E is to follow your demands. Surely, due to safety reasons, there are some limitations (no hatred, no chauvinism, etc. — be nice, don’t harm others).

Here are some perfect completions of your prompts.

A cat with a blue hat

DALL·E will be a definitively new meme generator.

A cat with angelic wings

Too much cat content for today…

Faust and Mephisto

Look at this dialogue and the merge between the Master and the Devil in their pact. This is how Goethe meant their relationship.

A mindmap wall with photos and notes in a room of a private detective.

Chaotic noirish room, cold, grey, and mesmerizing with the investigative obsession of a detective.

A man holds on to his academic papers tightly in excitement for new scientific breakthrough, as oil painting, in the style of Spitzweg.

This emotional impact already has a captivating effect — you begin to receive the shared joys of the scientists in awe.

Portraits of the same face, created by Dalí, Magritte, da Vinci, Chagall and Klimt.

As you see, DALL·E masters a broad spectrum from easy tasks to sophisticated demands.

In the last example, it even imitates artistic styles.

Artist’s essence.

But DALL·E goes far beyond just an imitation. To get a specific style of an artist, you can drive your prompt with the addition “in style by…”. Interestingly, DALL·E, doesn’t just apply Style Transfer.

It defines the creative essence of the artist.

In my experiment, I asked to create an image with the following prompt:

Good morning, in the style of Arcimboldo.

Giuseppe Arcimboldo is famous for his mannerist and playful style: in his paintings, he arranged objects into specific human forms:

DALL·E could:

Detect and interpret the stylistic approach (Arcimboldo)
Determine the meaning of “Good Morning” (here: breakfast)
Combine 1) and 2) in an appropriate way (even if not exactly with the wit of the original artist, but pretty convincing):

This combination of concepts reminds me of my textual experiments with GPT-3, where the model wrote me “A Love Letter by a Toaster”:

In this case, GPT-3 understood:

what a toaster is
how to write a love letter
and combined these two fully different concepts.

To test whether DALL·E just imitates styles or understands concepts, I applied the following prompt:

The Favorite Thing by Günther Ücker

The artist Günther Uecker is famous for using nails as an omnipresent motif in his assemblages and installations.

DALL·E is aware of this fact:

Creative Glitches

Sometimes DALL·E doesn’t deliver precisely what you demand. Nevertheless, it creates something fully out of the box.

As I asked to create a “Renaissance Painting as a First Person Shooter”, it hadn’t provided me with a Doom-alike hunt across Arcadia. Instead, it gave me my probably most favorite image, created by DALL·E:

This one:

Everything is in this image: the idea itself, the perfect visualization, the atmosphere. You can understand how you desire — this is art, emerging through your interpretation.

Metaphoric Power of Storytelling

You may call me an esoteric nerd, transgressing the Rubicon as I apply the concepts of creativity and storytelling on a machine, but I see that capacity. After all, we’re living in the age of creative human-machine collaboration.

DALL·E understands the cultural concepts and knows even the literary backgrounds.

With my prompt

Gollum writes the autobiography,

DALL·E provided the following visions:

Not only are these Sméagol’s portraits full of characteristic charisma. Also, DALL·E doesn’t use the iconic character design from Peter Jackson’s screen adaptation but rather the description from the book.

Philosophical concepts also work pretty well here.

Sisyphus as a happy man according to Albert Camus (with a stylistic predicate:), an oil painting in the style of da Vinci

delivers a series of happy men:

Here DALL·E knows about Sysiphus, about his torture with rolling up the stone, about Greek contents (cloths, beard, scenery), and yet it brings some happiness from Camus’ theory of absurdity.

This one is impressive.

Dreams of Franz Kafka

A young girl with a dark umbrella crosses the road and spreads the darkness among the meadows filled with sunlight…

This vivid mix of playful creepiness, dreamish absurdity, of bright abysses of the human soul in one picture, created within 30 seconds, is stunning.

Everything was created just with the prompt “Frank Kafka’s Dream”.

Creative Anarchy

And this is the point I embrace AI creativity and let it just be and create without human biasing or corrections.

These examples are proof of the chaotic fiction of a machine, with surreal wit and confusing semantical collisions.

The writer thinks through the main plot of her book, an oil painting, in the style of Spitzweg

Mind the small but determined pronounce “her” — DALL·E applies the self-attention of the Transformer network to create portraits of female writers here.

You can interpret this process of creative work in any way you like — but this intensive exploration of ideas behind a work, illustrated here, is compelling.

AI Artists in disbelief, in the style of Spitzweg*

) This is my little hack. Carl Spitzweg was famous for his satirical paintings — and it brings some more insanity into the AI art of DALL·E. The truth is, DALL·E won’t directly create images in the style of Carl Spitzweg; it will instead apply Spitzwegian irony to the results. Interestingly, with this prompt, we get very diverse styles simultaneously.

The variety of styles, tensions, emotions, and concepts is stunning here.

As Artnet published on Twitter a list of most expensive artworks being sold in March 2022, I asked DALL·E to create a new…

...most expensive artworks sold at auction around the world in March 2022

The works I’ve got were overwhelming. And it isn’t just my fascination with DALL·E. Every image below impacted my heart and mind with an intense auratic effect.

Summary

DALL·E has proven its boundless imagination — and we are just scratching the surface as we did with (and still do) GPT-3.

The model doesn’t just imitate styles or simulate ideas. It “understands” (in its ways) concepts and can visualize almost everything from easy tasks to symbolic and metaphoric texts.

Follow my Twitter account to see new experiments with DALL·E.

merzDALLEum

You can explore the artworks, created by DALL·E in my virtual 3D gallery “merzDALLEum”, via Browser or 3D headset.