
I first wrote about using Generative Adversarial Networks (GANs) to create visual art in August of 2020. For that project, MachineRay, I trained NVidia’s StyleGAN2 [1] with abstract paintings in the public domain to create new works. Since then, I have written a few more articles on using GANs to produce fine art. From the feedback I received, I could tell that some readers wanted to learn how to generate digital art for sale as Non-Fungible Tokens (NFTs) in a burgeoning new market.
If you are not familiar with cryptocurrency and NFTs, here’s a quick analogy.
Cryptocurrency is to precious metal as NFTs are to precious gems.
Every ounce of pure gold is valued the same but every diamond is unique and thus valued differently.
For this project, I took the plunge into the world of NFTs. Not only am I creating and selling newly created digital art as NFTs, but I am also selling "shares" of a trained GAN to allow other artists to produce their own digital art for sale as NFTs. I call this concept a "GANshare," and my first AI model is called GANshare One.
GANshare One Components
Here is a high-level diagram for the system with a brief description of the components. Further below, you can find the details of each part.

I started by gathering over 50,000 paintings in the public domain from WikiArt.org. I then trained a new AI model called VQGAN [2], developed at Heidelberg University in Germany. VQGAN has an image Encoder, Transformer, and Decoder that are trained to create new images and a Discriminator that is trained to detect if parts of the generated images are real or generated.
I use VQGAN to create new paintings based on text prompts. I automatically generate the prompts from text files that contain lists of painting styles, geometric shapes, geographical places, and an extensive list of color names. I use the CLIP model [3] from OpenAI to steer the VQGAN Decoder to produce a digital painting to match the prompt using the Adam optimizer [4] in the Pytorch library.
I use the ISR super Resolution Resizer [5] to scale the images up to 1024×1024 and post them on OpenSea for sale as NFTs. You can check out samples of the artwork here: https://opensea.io/collection/chromascapes
GANshare One System Details
This section will discuss the components I used to build the GANshare One system in more detail.
Gathering Images
Similar to what I did for my MAGnet project, I scraped paintings to be used as training data from WikiArt.org using a custom python script. The script went through each artist on the site alphabetically. It checked to see if the artists were born after 1800 and died before 1950. It then checked each artist’s paintings to make sure that they are in the public domain before copying them into a folder on my Google Drive. It found 52,288 images that met my criteria.
Here are some of the source paintings I used to train GANshare One from artists like Kandinsky, van Gogh, and Picasso.

VQGAN
The Vector Quantized Generative Adversarial Network (VQGAN) model by Esser et al. is a hybrid GAN/Transformer model used to generate high-quality images [2].
In their paper with the poetic title, "Taming Transformers for High-Resolution Image Synthesis," the authors state that their approach learns…
… a codebook of context-rich visual parts, whose composition is subsequently modeled with an autoregressive transformer architecture. A discrete codebook provides the interface between these architectures and a patch-based discriminator enables strong compression while retaining high perceptual quality. This method introduces the efficiency of convolutional approaches to transformer-based high-resolution image synthesis.
- Patrick Esser, Robin Rombach, and Bjorn Ommer
Here is their architecture diagram for VQGAN.

After I got my training images together, I used this command to train VQGAN using my Google Colab Pro [7] account:
!python main.py -b configs/custom_vqgan.yaml --train True --gpus 0,
Because VQGAN is a hybrid Transformer model, it shows you original and encoded samples during training. Here is a set of example images.


You can see that the model does a decent job recreating the original image, but it seems to do its own thing for some of the facial features, like the eyes and mouth.
In general, I found that the VQGAN works very well. Here is a table that compares some of the features of a classic GAN to the VQGAN.
Feature Classic GAN VQGAN
Discriminator: All-or-nothing Sub-image
Output Image Size: Fixed Size Variable Size
Reverse Encoding: Iterative Process Trained Encoder
Image Generation: New Images Look Good Images Need Steering
With the VQGAN, training is fast because each section of the image is checked by the discriminator, where classic GANs use an all-or-nothing approach to training. In other words, the discriminator in the VQGAN looks at 16 sub-images in a 4×4 grid and gives the generator a thumbs up or down for each section as feedback for improvement. The discriminator in the classic GAN gives the generator a single thumbs up or down for the entire image.
With classic GANs, the output image is always a fixed size. Because it works with a "codebook," the VQGAN can output images in varying sizes. For example, I trained VQGAN with 256×256 input images and used it to produce 512×512 output images.
For example, here are generated images for the prompt "rolling farmland" rendered at 256×256 and 512×512.


Notice how there are more features in the 512×512 image. This is because the features in the VQGANs codebook were learned at the 256×256 scale and rendered as smaller contributions to the larger picture. The output of the 512×512 image is akin to merging multiple vignettes together.
Classic GANs do not have a built-in model for reverse encoding, a process to find the closest generated image given an arbitrary real image. This has to be done with an iterative approach for a classic GAN, and it doesn’t always work well. Reverse encoding is easy for the VQGAN model, however, because it is effectively a codec. It has a model to encode an image to an embedding and a corresponding model to decode an embedding to generate an image.
These first three points are all advantages for the VQGAN. However, the fourth point is that it’s easy to produce output images from classic GANs. Just give it some random numbers, and it will generate a nice-looking picture. But the VQGAN does not have an easy way to make new images. If you give the decoder some random numbers, the output image will not be coherent. VQGAN needs to be steered by some other process, like a text prompt using the CLIP model to generate a recognizable image.
OpenAI’s CLIP Model
OpenAI designed and trained an AI system called CLIP, which stands for Contrastive Language–Image Pre-training [3]. The CLIP system has an image and text encoder, and it can be used to perform cross-modal semantic searches, e.g., you can use words to search images, as I did in my MAGnet project.
OpenAI trained the encoders on a dataset of images with corresponding phrases. The goal of the training is to have the encoded images match the encoded words. Once trained, the image encoder system converts images to embeddings, lists of 512 floating-point numbers that capture the images’ general features. The text encoder converts a text phrase to similar embedding that can be compared to image embeddings for a semantic search.

For example, if you have a database of images, you could run each image through the image encoder to get a list of image embeddings. If you then run the phrase "puppy on a green lawn" through the text encoder, you can find the image that best matches the phrase.

Generating Prompts
As mentioned above, the GANshare system creates images that are steered by CLIP using text prompts. If you tell VQGAN+CLIP to, say, create a painting of an "Abstract painting of circles in orange," it will make one.
In order to crank out a lot of paintings, I generated prompts with three varying parts: a style, a subject, and a color.
After experimenting, I found that these nine styles work reasonably well: Abstract, Cubist, Expressionist, Fauvist, Futurist, Geometric, Impressionist, Postmodern, and Surrealist.
For the subject, I choose from three categories: geometric shapes, geographic features, and objects. I started with Ivan Malopinsky’s word lists for my lists of shapes and geographic features and tweaked them a bit. And for my list of things, I combined the lists of objects recognized by the COCO and CIFAR 100 detection systems to get a list of 181 objects.
I grabbed an extensive list from Wikipedia for the color names and edited it down a bit to get 805 unique colors.
Here are the first seven entries in the four lists.
shapes.txt places.txt things.txt colors.csv
angles an archipelago an airplane absolute zero
blobs an atoll an apple acid green
circles a beach apples aero
cones a bay an aquarium fish aero blue
cubes a butte a baby african violet
curves a canal a backpack alabaster
cylinders a canyon a banana alice blue
... ... ... ...
Here is a link to the Python code that generates a prompt by randomly choosing a style, subject, and color.
Here are some prompts generated by the code.
Futurist Painting of a City in Vivid Burgundy Brown
Abstract Painting with Diagonals in Beige Pink
Impressionist Painting with Prisms in Carolina Blue
Now that we have some interesting prompts, we’ll see how we can steer VQGAN to generate corresponding images next.
Steering VQGAN with CLIP
For my MAGnet project, I used a custom generative algorithm to have CLIP steer a variant of StlyeGAN2 to create images from text prompts. For this project, I am using an algorithm designed by Katherine Crowson, an AI/generative artist who posts on Twitter as RiversHaveWings. To steer VQGAN with CLIP, she uses an optimizer in the Pytorch library, Adam, which stands for Adaptive Moment Estimation [4]. Below is a diagram of the algorithm.
Note that there are two embedding spaces in play here. The CLIP system uses a flat embedding of 512 numbers (represented as I and T), where the VQGAN uses a three-dimensional embedding with 256x16x16 numbers, represented as Z.

The goal of the optimization algorithm is to produce an output image that closely matches the text query. The system starts by running the text query through the CLIP text encoder to get the target T. The Adam optimizer begins with an initial VQGAN vector Zi. It modifies the vector Zn to produce an image with a CLIP embedding I that attempts to match the original target T. As I approaches T, the output image will better match the text query.
Let’s see how well the algorithm works with the prompts from the previous section. I ran the system for 200 iterations for each of the prompts.



Sure enough, the system creates images that seem to match the prompts. Notice how the optimization algorithm is improvising a bit. For example, it’s not clear if the third image contains any prisms, per se. But it does show some rainbow colors in the palette, suggesting the effects of light going through a prism.
Results
After generating hundreds of digital paintings, I noticed that they are not all winners. Images generated from prompts in the shapes category seemed to have the best yield, i.e., about 33% could be considered reasonable. But the images in the places and things categories didn’t work as well, with only about 20% keepers.
Here are some samples judged by me.












You can see more results here.
Selling NFTs on OpenSea

Ever since the digital artist known as Beeple sold a JPG file for US$69 Million back in March 2021, there has been a lot of interest in using a blockchain for selling digital art.
Beeple’s collaged JPG was made, or "minted," in February as a "nonfungible token," or NFT. A secure network of computer systems that records the sale on a digital ledger, known as a blockchain, gives buyers proof of authenticity and ownership. Most pay with the Ethereum cryptocurrency. "Everydays" was the first purely digital NFT sold by Christie’s, and it offered to accept payment in Ethereum, another first for the 255-year-old auction house. – Scott Reyburn, The New York Times
I looked into selling my digital art as NFTs, and after conducting a little research, I found that the OpenSea marketplace using the Polygon blockchain [10] is an excellent way to go. This is for both reasons of cost and environmental impact.
OpenSea allows creators to post items for sale on the Ethereum blockchain or the Polygon blockchain, also known as MATIC. Here is a table that shows some of the differences between the Ethereum and Polygon blockchains.
Feature Ethereum Polygon
Account initializing: ~US$50 Free
Selling cost: 2.5% 2.5%
Selling items in bundles: Yes No
Blockchain validation: Proof of work Proof of stake
Environmental impact: Huge* Minimal
*expected to change in Q1/Q2 2022
As you can see from the table, setting up an account on OpenSea to sell on the Polygon blockchain is free, where it costs about $50 to use the Ethereum blockchain. OpenSea will take 2.5% of the sale price for sales on either blockchain. Also, selling items in bundles is not available on the Polygon blockchain.
Environmental Impact of NFTs
Another big difference between the two blockchains is the environmental impact of proof-of-work (PoW) versus proof-of-stake (PoS) validation.
Energy consumption is one major difference between the two consensus mechanisms. Because proof-of-stake blockchains don’t require miners to spend electricity on duplicative processes (competing to solve the same puzzle), proof of stake allows networks to operate with substantially lower resource consumption. – coinbase.com
According to Polygon…
… the biggest PoW blockchains can consume a yearly quota of anywhere between 35–140 TWh of electricity, with a continuous draw of anywhere between 3–15 GW of electricity. Just the two biggest PoW blockchains would be ranked 27th in yearly energy consumption if it were a country – above the likes of Sweden, Vietnam and Argentina. … By contrast, Polygon’s validators approximately consume 0.00079TWh of electricity yearly with an approximate continuous draw of 0.00009GW, orders of magnitude below the energy consumption by the major PoW blockchain networks. – polygon.technology
The folks who run the Ethereum blockchain are clearly aware of their heavy environmental impact. They have a plan to move to proof-of-stake validation by mid-2022.
… the upgrade [in 2022] represents the official switch to proof-of-stake consensus. This eliminates the need for energy-intensive mining, and instead secures the network using staked ether. A truly exciting step in realizing the Eth2 vision – more scalability, security, and sustainability. – ethereum.org
Setting Up an OpenSea Account
Setting up to sell digital art as NFTs was easy (and free using the Polygon blockchain). I followed OpenSea’s simple directions here. Once I had my account set up, I posted my digital creations and offered them for sale here, https://opensea.io/collection/chromascapes

One of the cool features of OpenSea is the ability to add properties to the items for sale. Shoppers can use the properties to define the items they would like to see. For my case, I added the following properties for the generated paintings.

If you have a half dozen or so items, the UI is OK. But if you are looking to sell 100’s of things, there is no easy solution directly from OpenSea to upload and list the items for sale.
Using a Script for Bulk Uploading Items
Although OpenSea does have an API to automate specific tasks, their API does not have the functionality to upload and sell items on their marketplace automatically. This is unfortunate because uploading hundreds of files would be a real pain.
After reading Jim Dees’s article on posting NFTs using a macro program, I looked into doing this in my preferred language, Python. When correctly set up, I found that the Selenium WebDriver works well with the Chrome browser running on my local laptop.

First, I captured all of the metadata for the NFTs in a text file. It includes the filename, title, and properties for each painting. My script then logs into my account on OpenSea using the MetaMask wallet and runs through each item in the list performing the following tasks:
1. Create a new item in my OpenSea collection
2. Upload the image file
3. Enter the name of the item
4. Enter the description
5. Enter the properties (i.e. color, style, etc.)
6. Save the item
7. Go to sell the item
8. Enter the price
9. Post the item for sale
It required a little tuning, but I got the script working well enough to upload 1,000 NFTs.
GANshare
In addition to selling the good images as NFT, I am also selling "shares" of the trained GAN to other artists looking to generate new images. I am keeping one share for myself and selling three other shares here.
OpenSea allows for "locked content" when selling NFTs. After purchasing a GANshare, this content will enable the new owner to create new images and upload them for sale on OpenSea or any other NFT market.
Next Steps
I mentioned above that the yield for "good" generated images is only about 20% to 33%. Running the optimization algorithm for more than 200 iterations may improve the results, especially for the places and things categories. Also, it may be possible to use the CLIP encoder as an automated art critic. Checking to see how close the image embeddings are to a phrase like "beautiful painting" could provide an easier way to separate the wheat from the chaff. This could work well, unless you are into sloppy paintings of green skunks 😀 .
I also got a good suggestion from one of my reviewers, Oliver Strimpel. He asked if I could train a language-based AI system to generate good-quality prompts, and this might make it a fully automated art generation system. Excellent idea!
Source Code
A Google Colab for experimenting with VQGAN and Clip is here.
Acknowledgments
I want to thank Jennifer Lim and Oliver Strimpel for their help with this project.
References
[1] StyleGAN2 by T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, Analyzing and Improving the Image Quality of StyleGAN (2020)
[2] VQGAN by P. Esser, R. Rombach, and B. Ommer, Taming Transformers for High-Resolution Image Synthesis (2020)
[3] CLIP by A. Radford, et al., Learning Transferable Visual Models From Natural Language Supervision (2021)
[4] Adam Optimizer by D. Kingma and J. Ba, Adam: A Method for Stochastic Optimization (2017)
[5] ISR by F. Cardinale et al., ISR Super Resolution Resizer (2018)