The largest hurdle for artificial intelligence to overcome is to be creative. It is reasonably simple to solve problems using gradient-based optimization, or use embeddings and word-webs to solve natural-language processing problems. The true difficulty is for AI‘s to be creative, to create original content.
In this article, I will try to train a GAN to generate abstract art. The reason why I did not use realistic artworks to train the GAN is due to the fact that the computer would need to learn something about the real world and how they interact in order to create truly original realistic art.
What are GANs?
General Adversarial Networks, or GANs for short, are a type of neural network for generative Machine Learning. They are able to accurately recreate similar, but not identical, content to what they are fed in.
How do GANs work?
A GAN consists of two parts: A generator and a discriminator.
The generator is a Neural Network that takes in random values and returns a long array of pixel values, that can be reconstructed to form images. The discriminator is another separate Neural Network that compares "real" and "fake" images, and tries to guess if they are real or fake.
The adversarial part of the GAN is how they work together and feed into each other: When training the GAN, the loss value for the generator is how accurate the discriminator is. The worse the discriminator performs, the better the generator is performing. On the other hand, the loss value of the discriminator is based on the accuracy of the predictions.
This means that the two Neural Networks are competing against each other: One is trying to trick the other, while the other tries to avoid being tricked.
Advantages of GANs:
- Unsupervised Learning
Although the GAN itself is a form of Supervised Learning, the relationship between the generator and discriminator is unsupervised. This means that less data is required at every level of the network.
- Highly applicable
Since the generator and discriminator of the data have convolutional layers as their input layers, the data for GANs usually come in the form of images. Since images can be expressed as a long array of numbers, most numerical data can be composed into images and are therefore compatible with GANs.
Disadvantages of GANs:
- Long Computation Time
Because of the nested neural networks within the GAN, it can take a long time to train it. A good GPU is a necessity for training GANs.
- Possible Collapse
The balance between the generator and the discriminator is very fragile. If there is a local minima for the generator, it might start creating unrecognizable generations that, by coincidence, perfectly fool the discriminator. This would happen more for images in which there is no clear pattern, as it would pick it up on false signals.
- No True Way to Evaluate Model
On the premise that GANs would generate original image, there would be no objective, numerical way to check how accurate the recreations of the GAN is. One can only hope that the GAN will do its job.
Now that you have a basic understanding of how GANs should theoretically work, let’s look into the code.
Code:
The dataset for this project can be found here. This code is inspired by Dr. Jason Brownlee’s post on GANs.
Step 1| Prerequisites:
from numpy import expand_dims
from numpy import zeros
from numpy import ones
from numpy import vstack
from numpy.random import randn
from numpy.random import randint
from keras.datasets.mnist import load_data
from keras.optimizers import Adam
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Reshape
from keras.layers import Flatten
from keras.layers import Conv2D
from keras.layers import Conv2DTranspose
from keras.layers import LeakyReLU
from keras.layers import Dropout
from matplotlib import pyplot
from IPython.display import clear_output
These are the necessary components to construct the GAN.
Step 2| Prepare Data:
import os
from PIL import Image
from matplotlib import pyplot as plt
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def inv_sigmoid(x):
return np.log(y/(1-y))
%matplotlib inline
path = 'XXXXXXXXXXX'
os.getcwd()
img_list = os.listdir(path)
def access_images(img_list,path,length):
pixels = []
imgs = []
for i in range(length):
img = Image.open(path+''+img_list[i],'r')
basewidth = 100
img = img.resize((basewidth,basewidth), Image.ANTIALIAS)
pix = np.array(img.getdata())
pixels.append(pix.reshape(100,100,3))
imgs.append(img)
return np.array(pixels),imgs
def show_image(pix_list):
array = np.array(pix_list.reshape(100,100,3), dtype=np.uint8)
new_image = Image.fromarray(array)
new_image.show()
pixels,imgs = access_images(img_list,path,1000)
pixels.shape
Replace the path replaced with Xs with the directory that the data file is stored in. I added os.get_cwd() so people on different platforms can find the different formats of paths and adapt them to find the data. I reshaped all the photos into 100 by 100 pixels. and created a show_image function just to make sure everything is working properly.
Step 3| Define Discriminator:
def define_discriminator(in_shape = (100,100,3)):
model = Sequential()
model.add(Conv2D(64, (3,3), strides=(2, 2), padding='same', input_shape=in_shape))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.4))
model.add(Conv2D(64, (3,3), strides=(2, 2), padding='same'))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
opt = Adam(lr=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
return model
This discriminator takes in the list of fake and real images as input and returns a single value between 0 and 1. If the value is closer to 0, the computer thinks the image is real. If it comes closer to 1, the computer thinks the image is fake.
Step 4| Define Generator:
def define_generator(latent_dim):
model = Sequential()
n_nodes = 128 * 25 * 25
model.add(Dense(n_nodes, input_dim=latent_dim))
model.add(LeakyReLU(alpha=0.2))
model.add(Reshape((25, 25, 128)))
model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
model.add(LeakyReLU(alpha=0.2))
model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same'))
model.add(LeakyReLU(alpha=0.2))
model.add(Conv2D(3, (7,7) , padding='same'))
return model
The generator takes in a random point from latent space and takes it as an input. It upscales the latent point to the appropriate shape of 100,100,3, that can then be displayed as an image.
Step 5| Define GAN:
def define_gan(g_model, d_model):
d_model.trainable = False
model = Sequential()
model.add(g_model)
model.add(d_model)
opt = Adam(lr=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=opt)
return model
Linking the two models together gives the GAN, the complete model. The output of the generator is fed into the discriminator. It is then trained using these values.
Step 6| Generate Parts:
def generate_real_samples(dataset, n_samples):
ix = randint(0, dataset.shape[0], n_samples)
X = dataset[ix]
y = ones((n_samples, 1))
return X, y
def generate_latent_points(latent_dim, n_samples):
x_input = randn(latent_dim * n_samples)
x_input = x_input.reshape(n_samples, latent_dim)
return x_input
def generate_fake_samples(g_model, latent_dim, n_samples):
x_input = generate_latent_points(latent_dim, n_samples)
X = g_model.predict(x_input)
y = zeros((n_samples, 1))
return X, y
This calls upon the real and fake samples and generates the latent points that are used as the input for the generator.
Step 7| Train:
def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs=100, n_batch=10):
bat_per_epo = int(dataset.shape[0] / n_batch)
print(dataset.shape[0])
half_batch = int(n_batch / 2)
for i in range(n_epochs):
for j in range(bat_per_epo):
X_real, y_real = generate_real_samples(dataset, half_batch)
X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch)
X, y = vstack((X_real, X_fake)), vstack((y_real, y_fake))
d_loss, _ = d_model.train_on_batch(X, y)
X_gan = generate_latent_points(latent_dim, n_batch)
y_gan = ones((n_batch, 1))
g_loss = gan_model.train_on_batch(X_gan, y_gan)
print('>%d, %d/%d, d=%.3f, g=%.3f' % (i+1, j+1, bat_per_epo, d_loss, g_loss))
if (i+1) % 10 == 0:
summarize_performance(i, g_model, d_model, dataset, latent_dim)
clear_output()
Remember that the two models are programmed to work against each other. During training, the computer will print out the loss for each of the models. Whichever value has the lowest loss is technically winning the competition. This allows you to see when the balance between both of the models is breaking down.
Step 8| Summarize Performance:
def summarize_performance(epoch, g_model, d_model, dataset, latent_dim, n_samples=100):
X_real, y_real = generate_real_samples(dataset, n_samples)
_, acc_real = d_model.evaluate(X_real, y_real, verbose=0)
x_fake, y_fake = generate_fake_samples(g_model, latent_dim, n_samples)
_, acc_fake = d_model.evaluate(x_fake, y_fake, verbose=0)
print('>Accuracy real: %.0f%%, fake: %.0f%%' % (acc_real*100, acc_fake*100))
filename = 'generator_model_%03d.h5' % (epoch + 1)
g_model.save(filename)
After every epoch, this function will be called to show the loss and values for each epoch.
Step 9| Run Program:
latent_dim = 100
d_model = define_discriminator()
g_model = define_generator(latent_dim)
gan_model = define_gan(g_model, d_model)
print(pixels.shape)
train(g_model, d_model, gan_model, np.array(pixels), latent_dim)
print(pixels)
This script actually runs the program. For perspective on computation time, I use a windows surface pro.100 epochs takes about 2 hours for a batch_size as defined in the code above.
Step 10| Visualize Image generations:
from keras.models import load_model
from numpy.random import randn
from matplotlib import pyplot
def generate_latent_points(latent_dim, n_samples):
x_input = randn(latent_dim * n_samples)
x_input = x_input.reshape(n_samples, latent_dim)
return x_input
model = g_model
latent_points = generate_latent_points(100,1)
X = model.predict(latent_points)
array = np.array(X.reshape(100,100,3), dtype=np.uint8)
new_image = Image.fromarray(array)
new_image.show()
X
Running this script will output the abstract art that the GAN has generated.
Conclusion:
I hope you have learnt something from this article. My results from this project were quite poor because I did not have the resources to train the GAN properly and the dataset is reasonably small. Try applying this model to other datasets or applications, you will get satisfying results!
My links:
If you want to see more of my content, click this link.