The world’s leading publication for data science, AI, and ML professionals.

Unit 8) Co-Evolution -Reinforcement Learning for Game AI Design

Co-Evolve Competing Game AI's for playing Lunar Lander using Python Gym Environment and EvolutionaryComputation Library!

Evolutionary Computation Course

Hello and welcome back to this full course on Evolutionary Computation! In this post, we will start and finish Unit 8, Co-Evolution. Unfortunately, this will be the last unit covered in the course, but hopefully, you have learned a lot along the way! In the previous unit, we applied Differential Evolution for evolving the model architecture of a Convolutional Neural Network in Keras, you can find that article here:

Unit 7) Differential Evolution – Automated Machine Learning

If you are new to this series, please check out these two articles below where I go over the background information necessary for understanding Evolutionary Computation:

Unit 2) Introduction To Evolutionary Computation

Unit 3) Genetic Algorithms (Part 1)

In this post, we will cover a brief overview of Co-Evolution, competitive fitness, the different types of Co-Evolution, and then wrap up by evolving cooperative/competitive game AI’s for playing Lunar Lander.

Table of Contents

  • Differences between Co-Evolution and Standard Genetic Algorithms
  • Competitive Fitness
  • Competitive and Cooperative Co-Evolution Pseudo-Algorithms
  • Python Gym Environment for Reinforcement Learning
  • Competitive/Co-operative Co-Evolution for Lunar Lander
  • Code
  • Conclusion

Differences between Co-Evolution and Standard Genetic Algorithms

The main difference between Co-Evolution and standard Genetic Algorithms is that Co-Evolution is not an evolutionary algorithm, but a methodology for simultaneously evolving different species and populations. The idea behind Co-Evolution is to evolve two or more unique species of individuals for the goal of solving the same problem. For example, suppose we are back in Unit 3 and 4 where we were trying to forecast the Time Series problem. The approach taken in Unit 3 was to evolve the weights of a fixed neural network, while the approach in Unit 4 was to evolve a decision tree. Co-Evolution occurs when we combine the two unique algorithms to create two distinct populations, species of algorithms, and have them evolve simultaneously for the same problem. In this way, we can create a competitive environment where each species evolves based on how much better they are than the other species, not necessarily on how fit they are within their own species.

There are two main types of Co-Evolution:

  • Predator vs. Prey (Competitive)
  • Symbiosis (Cooperative)

Competitive Co-Evolution is commonly used for determining the best species out of the possible populations or for evolving different strategies for Game AI designs. On the other hand, Cooperative Co-Evolution is performed where the goal is for all the species to cooperative together for solving a particular problem.

Competitive Fitness

As stated before, in Co-Evolution, the fitness of an individual is calculated by how much better it is than the individuals in other species/populations, not within its own species. There exists a different way for calculating the fitness for both of the types of Co-Evolution methodologies described earlier.

For Predator vs. Prey (Competitive), we want to compare the fitness scores between species. We can compare through competitive fitness sampling, where we take a random sample of fitness values of the other population to compare against the current. There are four main sampling techniques:

  1. All Sampling – Each individual for each species is compared to all the individuals of the other species.
  2. Random Sampling – Each individual for each species is compared against a random sample of the entire population for each species.
  3. Tournament Sampling – Each individual for each species is compared against a small tournament from each species.
  4. Best Sampling – Each individual for each species is compared against the best from each species.

For Symbiosis (Cooperative), because we want the general population of to Co-Evolve, we will perform relative fitness instead of competitive fitness. Relative fitness focuses on how well individuals perform relative to those around them, including their own species. There are three main sampling techniques:

  1. All Sampling – Each individual is compared against all individuals
  2. Random Sampling – Each individual is compared against a random sample
  3. Tournament Sampling – Each individual is compared against a small tournament

As we can see, the sampling techniques for competitive and cooperative co-evolution are extremely similar; the major difference being competitive is focused on how well each species performs relative to the others, while coevolution is focused on how well the entire population of species performs.

Competitive and Cooperative Co-Evolution Pseudo-Algorithms

Now that we’ve discussed the major differences of Co-Evolution, lets go over two pseudo-code algorithms for designing game AI agents. First Competitive Co-Evolution:

Image by Author
Image by Author

Above we have an example of Competitive Co-Evolution between two species of populations where their fitness is based off some sample of fitness values from the other species.

For Cooperative Co-Evolution:

Image by Author
Image by Author

Above we have an example of Cooperative Co-Evolution between all the species of populations. We simply calculate the raw fitness and then relative fitness through some type of sampling technique discussed above. It is common in cooperative Co-Evolution to allow for species with poor relative fitness values to die out.

Python Gym Environment for Reinforcement Learning

Developing game AI agents is a vast field containing many different fields of discipline. In this domain there exists a plethora of different methods creating game AI agents; however, the overarching field of training these agents is known as Reinforcement Learning: how agents should act in different circumstances.

https://commons.wikimedia.org/wiki/File:Reinforcement_learning_diagram.svg
https://commons.wikimedia.org/wiki/File:Reinforcement_learning_diagram.svg

In Reinforcement Learning scenarios, we reward our agent for the actions they take in the environment. If the agent took a bad action we penalize it, otherwise we reward it for the good actions. In this way, the agent learns to navigate the environment in order to maximize the reward. The most common way to create AI agents in these types of problems is through a Neural Network. I’m not going to detail Neural Networks here so if you are unfamiliar I suggest reading up on them. There exist many Reinforcement Algorithms for Neural Networks, such as Actor Critic Method, Q Learning, DDPG, and more. However, in this post our goal is not to use the methods described above but instead a Genetic Algorithm.

The main problem some might run into when trying to test out their Reinforcement Learning algorithm is how to create the environment itself. Well, luckily for us, in Python there exists a library known as Gym, which is a toolkit for developing and comparing reinforcement learning algorithms, and contains tons of old Atari Games, complex physics problems, and other simple little games. You can find their website below:

Gym: A toolkit for developing and comparing reinforcement learning algorithms

We will be using the environments in Gym for testing our Genetic Algorithm.

Competitive/Cooperative Co-Evolution for Lunar Lander

Our applied problem is to tackle the Lunar Landar-v2 environment. The environment is native to Gym. The goal of the problem is to land a lunar lander on top of a landing pad. The input is eight different numerical values that represent the distance and orientation of the lander, while the output is four discrete options, move left, right, up, or do nothing. When a fitness value of 200 is obtained, it is regarded that the problem has been solved.

For this we will use the NeuroReinforcer class from the EvolutionaryComputation library, p.s. I wrote it. This will be a short snippet into the library that is not fully complete as of current, but I couldn’t wait until it was complete to showcase this example. You can find more information of the library and its update on my GitHub repository page:

GitHub – OUStudent/EvolutionaryComputation

You can also find it here on PyPi

EvolutionaryComputation

The NeuroReinforcer class is a class specialized for solving reinforcement type learning problems where the input is numerical, while NeuroReinforcerImages class solves reinforcement type learning problems where the input is images. The NeuroReinforcer class works by evolving the weights and activation functions of a feed forward neural network through an advanced self-adaptive log-normal genetic algorithm. NeuroReinforcer showcases combined both competitive and cooperative mechanisms into one for evolution. It showcases cooperative aspects as the goal is for all species to evolve the best model possible for the problem; in addition, it showcases competitive qualities as each of the species within the evolution must fight for survival, or become extinct… Species within NeuroReinforcer are designated by their activation function, which can either be static for all layers or mixed.

To set up the problem, we first need to create our fitness function, which will take in the population and return the ‘fitness’ for each individual. The ‘fitness’ will simply be the reward after playing one game:

Now we need to create the neural network architecture. For this example we will choose a network architecture with three hidden layers and 50, 100, 50 nodes. For activation function choices, we will allow for relu, leaky relu, selu, elu, gaussian, sigmoid, and tanh. To allow some variation within the evolution process, there will be a 5% chance of an individual to switch species if its current species is larger than one. Lastly, for speciation we will treat the activation functions to be static for all layers of the individual.

Initial Population

Here we have the initial agents from the population. Notice how they do a very poor job at landing the lunar lander. However, they will soon learn how to after being rewarded for good actions in the environment.

Plot Results – Complete Layer Activations

After the maximum number of generations has been reached, it is time to visualize the results:

First, we have the overall plot of the best and mean reward scores:

As we can see above, the best reward heavily fluctuated between generations 0 and 100, indicating that the high rewards were achieved simply from random chance. However, notice that at generation 225 the mean reward increased drastically before stalling out around generations 260. The best reward for a given generation can be achieved simply from random chance, therefore it is advised to look at the mean reward as that showcases the behavior of the population. At generation 240, the mean reward increased to a little over 200, showcasing that the algorithm was successful in evolving a population of models for solving the Lunar-Lander-v2 problem.

Below we can look at the species sizes after evolution. The legend showcases the species, where the the activation function represents the activation of that particular layer. For example, the species ‘elu,elu,elu’ showcases that the three hidden layers all have ‘elu’ as an activation function.

Image by Author
Image by Author

As we can see above, leaky relu (green) started off large but soon died off around generation 60, while elu (blue), tanh (pink), and selu (purple) began to take the wheel. However, all but selu soon began to dwindle down. The top two activation functions would probably be selu and tanh.

Now since we’ve determined the top two activation functions, we can redo the evolution except this time allow the activation functions to mix per layer between the the top two, selu and tanh. In this way, we can evolve the best possible activation function architecture.

Plot Results -Mixed Layer Activations

As we can see from above, refining our layer activation search actually improved the population convergence. The problem is deemed to be solved when a reward of 200 has been obtained, which was achieved in the population by generation 170, 55 generations quicker than keeping the layer activations static. However, the best model at the end of the convergence had a similar final reward with the previous evolution. If the problem was more complex, then the second refined architecture search may have obtained a better best model as well.

Below we can look at the species sizes after evolution. The legend showcases the species, where the the activation function represents the activation of that particular layer. For example, the species ‘selu,selu,tanh’ showcases that the first two hidden layers both have ‘selu’ as the activaiton function while the third layer has ‘tanh’.

As we can see, the species sizes varied greatly during evolution. ‘selu,tanh,selu‘ (green) peaked as the species with the largest size over the entire evolution with what appears to be 45 individuals out of a population of 50 at around 190 generations; however, the species slowly died down around 10 by convergence. The largest final species at convergence was ‘tanh,tanh,selu’ at around 15 individuals out of a population of 50.

Best Model After Final Generation

After evolution, the best model was saved and here are the visual results for six consecutive games:

Conclusion

Co-evolution can either be competitive, cooperative, or both. The goal in co-evolution is to co-evolve different species or algorithms to solve a given problem. It is most commonly used in reinforcement type scenarios.

In this post, we discussed two basic pseudo-algorithms for competitive and cooperative environments, while our example combined the two for solving the Lunar Lander environment by Gym. The API we used for this was the NeuroReinforcer class from the EvolutionaryComputation library.

This post will be the final section for this course on Evolutionary Computation. I hope you all have learned a lot about Evolutionary Algorithms and are eager to use them in your own problems! In the next post I will introduce the EvolutionaryComputation library, which I created using all the material covered in this course!


Related Articles