
GANs are one of the most computationally intensive models to train, as it is the equivalent of training two neural networks at the same time. For my lousy portable computer, training a GAN until convergence is very difficult. I have written a generic genetic algorithm, that can be adapted to many different problems. I adapted this genetic algorithm to train GANs, generate handwritten digits.
What are genetic algorithms?
Genetic Algorithms are a type of learning algorithm, that uses the idea that crossing over the weights of two good neural networks, would result in a better neural network.
The reason that genetic algorithms are so effective is because there is no direct optimization algorithm, allowing for the possibility to have extremely varied results. Additionally, they often come up with very interesting solutions that often give valuable insight into the problem.
How do they work?
A set of random weights are generated. This is the neural network of the first agent. A set of tests are performed on the agent. The agent receives a score based on the tests. Repeat this several times to create a population.Select the top 10% of the population to be available to crossover. Two random parents are chosen from the top 10% and their weights are crossover. Every time a crossover occurs, there is a small chance of mutation: That is a random value that is in neither of the parent’s weights.
This process slowly optimizes the agent’s performance, as the agents slowly adapt to the environment.
Advantages and Disadvantages:
Advantages:
- Computationally not intensive
There are no linear algebra calculations to be done. The only Machine Learning calculations necessary are forward passes through the neural networks. Because of this, the system requirements are very broad, as compared to Deep Neural Networks.
- Adaptable
One could adapt and insert many different tests and ways to manipulate the flexible nature of genetic algorithms. One could create a GAN within a Genetic algorithm, by making the agents propagate Generator networks, and the tests being the discriminators. This is a critical benefit, that persuades me that the use of genetic algorithm will be more widespread in the future.
- Understandable
For normal neural networks, the learning patterns of the algorithm are enigmatic at best. For genetic algorithms it is easy to understand why some things come about: For example, when a genetic algorithm is given the Tic-Tac-Toe environment, certain recognizable strategies slowly develop. This is a large benefit, as the use of machine learning is to use technology to help us gain insight on important matters.
Disadvantages:
- Takes a long period of time
Unlucky crossovers and Mutations could result in a negative effect on the program’s accuracy, and therefore make the program slower to converge or reach a certain loss threshold.
The Code:
Now that you have a reasonably comprehensive understanding of genetic algorithms, and its strengths and its limitations, I am now able to show you the program:
import random
import numpy as np
from IPython.display import clear_output
from keras.layers import Reshape
from keras.layers import Flatten
from keras.layers import Conv2D
from keras.layers import Conv2DTranspose
from keras.layers import LeakyReLU
from keras.layers import Dropout,Dense
from keras.optimizers import Adam
from keras.models import Sequential
from keras.datasets.mnist import load_data
(trainX, trainy), (testX, testy) = load_data()
Keras is needed for the discriminator, but the neural networks in the genetic algorithm are created by the code below, in which it is built with numpy as its basis.
class g enetic_algorithm:
def execute(pop_size,generations,threshold,network):
class Agent:
def __init__(self,network):
This is the creation of the class "genetic_algorithm" that holds all the functions that concerns the genetic algorithm and how it is supposed to function. The main function is the execute function, that takes pop_size,generations,threshold,network as parameters. pop_size is the size of the generated population, generations is the term for epochs, threshold is the loss value that you are satisfied with. X and y are for applications of genetic algorithms for labelled data. You can remove all instances of X and y for problems with no data or unlabelled data. Network is the network structure of the neural network.
class neural_network:
def __init__(self,network):
self.weights = []
self.activations = []
for layer in network:
if layer[0] != None:
input_size = layer[0]
else:
input_size = network[network.index(layer)-1][1]
output_size = layer[1]
activation = layer[2]
self.weights.append(np.random.randn(input_size,output_size))
self.activations.append(activation)
def propagate(self,data):
input_data = data
for i in range(len(self.weights)):
z = np.dot(input_data,self.weights[i])
a = self.activations[i](z)
input_data = a
yhat = a
return yhat
self.neural_network = neural_network(network)
self.fitness = 0
This script describes the initialization of the weights and the propagation of the network for each agent’s neural network.
def generate_agents(population, network):
return [Agent(network) for _ in range(population)]
This function creates the first population of agents that will be tested.
def fitness(agents):
for agent in agents:
dataset_len = 100
fake = []
real = []
y = []
for i in range(dataset_len//2):
fake.append(agent.neural_network.propagate(np.random.randn(latent_size)).reshape(28,28))
y.append(0)
real.append(random.choice(trainX))
y.append(1)
X = fake+real
X = np.array(X).astype('uint8').reshape(len(X),28,28,1)
y = np.array(y).astype('uint8')
model.fit(X,y,verbose = 0)
fake = []
real = []
y = []
for i in range(dataset_len//2):
fake.append(agent.neural_network.propagate(np.random.randn(latent_size)).reshape(28,28))
y.append(0)
real.append(random.choice(trainX))
y.append(1)
X = fake+real
X = np.array(X).astype('uint8').reshape(len(X),28,28,1)
y = np.array(y).astype('uint8')
agent.fitness = model.evaluate(X,y,verbose = 0)[1]*100
return agents
The fitness function is the unique part of this genetic algorithm:
A discriminator-type neural network will be defined later. This model will be trained based on the MNIST dataset loaded earlier. The model is in the form of a convolutional network to return binary results back.
def selection(agents):
agents = sorted(agents, key=lambda agent: agent.fitness, reverse=False)
print('n'.join(map(str, agents)))
agents = agents[:int(0.2 * len(agents))]
return agents
This function mimics the theory of selection in evolution: The best survive while the others are left to die. In this case, their data is forgotten and is not used again.
def unflatten(flattened,shapes):
newarray = []
index = 0
for shape in shapes:
size = np.product(shape)
newarray.append(flattened[index : index + size].reshape(shape))
index += size
return newarray
To execute the crossover and mutation functions, the weights need to be flattened and unflattened into the original shapes.
def crossover(agents,network,pop_size):
offspring = []
for _ in range((pop_size - len(agents)) // 2):
parent1 = random.choice(agents)
parent2 = random.choice(agents)
child1 = Agent(network)
child2 = Agent(network)
shapes = [a.shape for a in parent1.neural_network.weights]
genes1 = np.concatenate([a.flatten() for a in parent1.neural_network.weights])
genes2 = np.concatenate([a.flatten() for a in parent2.neural_network.weights])
split = random.ragendint(0,len(genes1)-1)child1_genes = np.asrray(genes1[0:split].tolist() + genes2[split:].tolist())
child2_genes = np.array(genes1[0:split].tolist() + genes2[split:].tolist())
child1.neural_network.weights = unflatten(child1_genes,shapes)
child2.neural_network.weights = unflatten(child2_genes,shapes)
offspring.append(child1)
offspring.append(child2)
agents.extend(offspring)
return agents
The crossover function is one of the most complicated functions in the program. It generates two new "children" agents, whose weights that are replaced as a crossover of wo randomly generated parents. This is the process of creating the weights:
- Flatten the weights of the parents
- Generate two splitting points
- Use the splitting points as indices to set the weights of the two children agents
This is the full process of the crossover of agents.
def mutation(agents):
for agent in agents:
if random.uniform(0.0, 1.0) <= 0.1:
weights = agent.neural_network.weights
shapes = [a.shape for a in weights]flattened = np.concatenate([a.flatten() for a in weights])
randint = random.randint(0,len(flattened)-1)
flattened[randint] = np.random.randn()newarray = [a ]
indeweights = 0
for shape in shapes:
size = np.product(shape)
newarray.append(flattened[indeweights : indeweights + size].reshape(shape))
indeweights += size
agent.neural_network.weights = newarray
return agents
This is the mutation function. The flattening is the same as the crossover function. Instead of splitting the points, a random point is chosen, to be replaced with a random value.
for i in range(generations):
print('Generation',str(i),':')
agents = generate_agents(pop_size,network)
agents = fitness(agents)
agents = selection(agents)
agents = crossover(agents,network,pop_size)
agents = mutation(agents)
agents = fitness(agents)
if any(agent.fitness < threshold for agent in agents):
print('Threshold met at generation '+str(i)+' !')
if i % 100:
clear_output()
return agents[0]
This is the last part of the execute function, that executes all the functions that have been defined.
image_size = 28
latent_size = 100
model = Sequential()
model.add(Conv2D(64, (3,3), strides=(2, 2), padding='same', input_shape=(image_size,image_size,1)))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.4))
model.add(Conv2D(64, (3,3), strides=(2, 2), padding='same'))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
opt = Adam(lr=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
network = [[latent_size,100,sigmoid],[None,image_size**2,sigmoid]]
ga = genetic_algorithm
agent = ga.execute(1000,1000,90,network)
(trainX, trainy), (testX, testy) = load_data()
weights = agent.neural_network.weights
This is the discriminator convolutional network that I talked about earlier. Image_size is the size of the MNIST image, and latent_size is to make sure
This executes the whole genetic algorithm. For the network variable, each nested list holds the input neuron numbers, the output neuron numbers and the activation functions. The execute function returns the best agent.
Conclusion:
Obviously the genetic algorithm will not converge as fast as the gradient-based algorithm, but the computational work is spread over a longer period of time, making it less intensive on the computer!
My links:
If you want to see more of my content, click this link.