The world’s leading publication for data science, AI, and ML professionals.

Unit 2) Introduction To Evolutionary Computation

Brief overview to Evolutionary Computation and Genetic Algorithm!

Evolutionary Computation Course

Hello and Welcome back to this full course on Evolutionary Computation! In this post we will cover Unit 2 of the course, Introduction to Evolutionary Computation. In the previous post we covered Unit 1, Optimization Theory, you can check it out here:

Unit 1) Optimization Theory

In this post I will cover the basic overview of Evolutionary Computation (EC) as a whole. Starting from the biological inspiration, we will see how EC seeks to solve problems, then we will move on to a basic overview of Evolutionary Algorithms and explain the main details. In the next Unit we will actually get to see the intricacies of Genetic Algorithms.

Table of Contents

  • Biological Inspiration
  • Overview of Evolutionary Algorithms
  • Representation of the Chromosome
  • Initial Population
  • Choosing Fitness Function
  • Selection Process
  • Reproduction Operators
  • Stopping Conditions
  • Difference in Evolutionary Algorithms
  • Conclusion

Biological Inspiration

Evolution can be described as a process by which individuals become ‘fitter’ in different environments through adaptation, natural selection, and selective breeding. Here we have a picture of the famous finches Charles Darwin depicted in his journal while on the Galapagos Island. He noted the extreme diversity within the species for beak sizes. This is an example of mutation.

https://commons.wikimedia.org/wiki/File:Darwin%27s_finches_by_Gould.jpg
https://commons.wikimedia.org/wiki/File:Darwin%27s_finches_by_Gould.jpg

Here we have a picture of a trivial example of natural selection, adaptation, and selective breeding.

https://commons.wikimedia.org/wiki/File:Darwinismoa_jirafaren_adibidea.png
https://commons.wikimedia.org/wiki/File:Darwinismoa_jirafaren_adibidea.png

We can see that the vegetation for the giraffes are at a high altitude, making it easier for giraffes with longer necks to reach. Because of the environment, the giraffes with shorter necks are unable to eat, thus starving and dying, an example of natural selection. On the other hand, because only the giraffes with longer necks survived, we have a degree of selective breeding where only the ‘fittest’ individuals reproduce. Because only the ‘fittest’ individuals reproduce (those with longer necks), the offspring are born with adaptations to the environment (a longer neck).

In Evolutionary Computation, we try to model these principles to find the best solution to a problem. Each possible solution to a problem is represented as an individual in a pool of a population, where we perform adaptation, natural selection, and selective breeding on those possible solutions to ultimately find the best solution for the problem.

Overview of Evolutionary Algorithms

For a brief overview of the canonical genetic algorithm, we start off with an initial population, we then choose how we are going to encode these solutions as chromosomes, then we select our fitness function, most cases it is the optimization function itself, and then lastly we choose how we are going to select which individuals are going to survive and reproduce.

Image by Author
Image by Author

Above we have a basic diagram on how this algorithm would work. It’s ok if you don’t understand everything at first, we will come back to it later on. First, we initialize our population randomly, then we select those who are the best fit to reproduce. We then exchange alleles between the two parents through crossover and introduce new genetic material through mutation. After this, we have some type of survival mechanism to determine who survives and who does not. We repeat this process of mating, reproduction, and survival until termination where a stopping criterion is reached.

Representation of the Chromosome

Cells are one of the most basic building block of organisms. Each cell is built off strands of DNA. Chromosomes, the main structures of DNA, contain large strands of genes, where each gene represents some type of inherited characteristic. In EC, we want to model these characteristics into our optimization problems. An individual in EC is simply a point in the domain space for that optimization problem, where each variable value for the function is represented by a gene. Thus, each point refers to an individual where each component of that point, i.e. variable value, is encoded as a gene in the genome. The genotype describes the genetic makeup of an individual. On the other hand, the phenotype defines what the individual looks like in the environment.

For example, a single point in a domain space for an optimization problem is represented in its genome by the component values for its variables; however, its phenotype is the actual point itself in the surface space of the function. Later on, when we introduce Genetic Programming, instead of representing an individual as a point, it is a tree graph representation of an actual program or decision tree. In addition, for other types of EC algorithms, the phenotype could be a neural network, design layout, robot, etc. The point is that the genotype represents what the individual is made up of, whereas the phenotype represents what the individual looks like physically in its environment. For a concrete example, here below we have the phenotype of a point (in red) in its function space:

Image by Author
Image by Author

However, the genotype of the point above is made up by its variable values:

Image by Author
Image by Author

In EC, the classic way to represent individuals, i.e. points in our input space, is by using binary encoding. Binary encoding was the first method used to encode individuals as it was inspired by actual DNA compounds where genes are made up of three possible bases used for describing amino acids.

https://commons.wikimedia.org/wiki/File:DNA_replication_en.svg
https://commons.wikimedia.org/wiki/File:DNA_replication_en.svg

Therefore, from this biological inspiration we can encode our potential solutions in binary to best represent DNA. We can see here below that we have three sections for encoding a floating point number in 32 bit binary, namely a sign bit, 8 bits to denote the exponent and 23 bits for the fraction.

https://en.wikipedia.org/wiki/Single-precision_floating-point_format
https://en.wikipedia.org/wiki/Single-precision_floating-point_format

However, in modern practice, binary representation has been dis-followed in favor of floating point representation, which we will cover in the next Unit.

There are two main problems with simple binary representation. The first being that binary representation leads to what is known as hamming cliffs. We can see an example of this:

Image by Author
Image by Author

The way to measure the difference between to binary vectors is through hamming distance, which simply counts up the number of bits that are disjoint. For example, bit 7 = 0111 and bit 8 = 1000; they share no similar bits, therefore they have a hamming distance of 4. The problem is that 7 and 8 are next to each other but their hamming distance in binary coding is large due to the dissimilarity of their bits, leading to hamming cliffs. In response, gray encoding fixes this problem by taking the logical OR and AND operation between the NOT of the bits to create an alternative encoding. By using Gray Encoding we limit the Hamming Distance between any two adjacent numbers to 1. Here is an example of the binary encoding and the corresponding gray encoding of numbers 0 to 7:

Image By Author
Image By Author

However, despite this solution, encoding solutions into binary is extremely time consuming.

Initial Population

The Initial population of EC algorithms are extremely important. Evolutionary Algorithms (EA) are population based search algorithms, meaning it works by taking a pool of initial points and searches these points in parallel. Unlike standard numerical methods, such as Newtons method where you only feed it one initial value, EA’s work by using the diversity of the population to search for better solutions. In addition, we also need a way to represent the entire search space/domain of the problem. If our initial population only includes values from a centrally located part of the search space then our algorithm will only find optimal solutions near that search space.

Therefore, to ensure diversity, we need to randomly sample uniformly from our domain space. Now, whenever we are dealing with constrained problems where the bounds are different dependent upon variable values, our initial uniform sampling might lead to infeasible solutions. To get around this we either have to hard code this constraint for our uniform sampling or just sample over the entire input space but only keep feasible solutions.

Lastly, the choice of the initial population size is crucial to the convergence and complexity of the algorithm. Large population sizes increase initial diversity and can maintain that diversity; however, it heavily taxes computational complexity. On the other hand, smaller sizes decrease initial diversity and diversity held throughout the generations, but it requires less computation, offering a faster alternative. The choice of size for the initial population is dependent upon the complexity of the problem, computation cost of the fitness function, and available waiting time.

Choosing Fitness Function

In evolution, the motto is that only the best fit individuals in the environment survive. How do we determine which individuals are fit? In EC the fitness function is the optimization problem itself, where values that have larger values have better fitness. In the case of minimization, we can scale the function values such that smaller values result in larger fitness values. The selection of the fitness function is directly dependent upon what problem is needing to be optimized. From Unit 1, Optimization Theory, there are four main types of problems, unconstrained, constrained, multi objective, and multi solution. EA’s can work on all these types of problems but may have difficulty with two, constrained and multi solution. Because the domain is dynamic in a constrained environment, our algorithm may encounter infeasible solutions. To get around this we two main solutions. One is to add a penalty term to the function value for individuals that are infeasible. In this way, we decrease the fitness for infeasible solutions. The only problem with penalty terms are that they change the fitness landscape of our search space. Here we have an exact definition of a penalty term:

Image by Author
Image by Author

On the other hand, we could adjust our reproduction operators such that it creates only feasible solutions; however this can become problematic with many constraints.

Lastly, EA’s have problems with multi solution problems as EA’s tend to converge to singular points or a group of a few points. In instances where there are many solutions that need to be returned, EA’s can struggle in these circumstances. To get around this we can introduce Euclidean distance such that once one of the global extrema values is found, we add a penalty term that deters individuals from approaching that exact point in Euclidean space. We will cover this application later.

Selection

Once we have our initial population and compute the fitness of each individual, we need a way to determine who will be selected for survival and reproduction. In EA, there is a trade off between exploitation and exploration. Exploitation simply refers to the convergence quality of the algorithm. If we select an algorithm that always chooses the fittest individuals to survive, then we are exhibiting high exploitation but low exploration. High exploitation leads to quick convergence; but, it can also lead to premature convergence.

https://commons.wikimedia.org/wiki/File:Fitness-landscape-cartoon.png
https://commons.wikimedia.org/wiki/File:Fitness-landscape-cartoon.png

For example, in the picture above (assuming maximization), the global maximum is at B, but if that position has not been explored by our algorithm, then it might prematurely converge to A or C depending on how much of the input space has been explored. In the picture however, the initial point lies between A and B. Therefore it will either converge to A or B, depending on random chance from the algorithm. In opposition to exploitation is Exploration, this selection type does not always choose the fittest individuals to mate, encouraging exploration of the input space. However, by encouraging exploration, the algorithm can have poor convergence qualities.

In practice, it is sometimes best to utilize both, exploration in the beginning and then exploitation towards the end. There are many selection methods, but here are the most popular:

  • Random Selection
  • Proportional Selection (Roulette Wheel)
  • Rank Based Selection
  • Tournament Selection
  • Boltzmann Selection

Selective Pressure refers to how long it would take to produce a uniform population, therefore high selective pressure decreases diversity whereas low selective pressure encourages exploration. In this series we will talk about random selection, proportional selection, rank based selection, tournament style selection, and Boltzmann selection. However, in application we will mainly focus on random, proportional, and tournament as they are the most common. Along with these methods we have certain mechanisms that we can use, such as elitism and hall of fame.

https://commons.wikimedia.org/wiki/File:Simple_random_sampling.PNG
https://commons.wikimedia.org/wiki/File:Simple_random_sampling.PNG

Random selection is a simple method that has good exploration qualities as it chooses the individuals to survive at random; however, because of this, it does not guarantee that the current best solution will survive, leading to possible solutions being lost. It is most commonly paired with elitism as we will see in a moment.

Image by Author
Image by Author

Proportional selection, as we can see above, creates a probability distribution from the fitness values by dividing each fitness value by the sum of fitness values. Because selection is based off of probability, it ensures good diversity while also ensuring good convergence; however, because the probability is based off the proportion of the fitness value to the rest of the fitness values, individuals early on with large fitness values can seem to dominate the selection process.

Image by Author
Image by Author

To alleviate the problem of large fitness values from dominating the reproduction space in proportional selection, Rank Based Selection has been proposed. It is like that of proportional except the probability of being chosen is not based off the raw fitness value, but rather the ranking of the individual. From the fitness values, individuals are ranked from worst to best where the best has rank one, and then the proportion decreases for individuals with higher rankings either linearly or nonlinearly. Above, we have an example of rank based selection except where the best is ranked highest and worst is ranked lowest. The exact ranking of an individual depends upon what type of rank based selection equation you are using.

https://commons.wikimedia.org/wiki/File:Deterministic-tournament-selection-3.svg
https://commons.wikimedia.org/wiki/File:Deterministic-tournament-selection-3.svg

Tournament Selection on the other hand, randomly selects a group of individuals, where the amount is known as tournament size, and the solution with the best fitness value is chosen from the tournament to survive. This is performed to ensure good convergence quality as the best individual is chosen but also has a component of diversity through tournament size. Note that a tournament size of 1 is essentially equivalent to random selection; therefore, lower tournament sizes ensure good diversity, while large tournament sizes ensure that the best fitness value will be chosen.

Boltzmann Selection is a another form of selection where the probability of being chosen is based of a simulated annealing equation. We will cover Boltzmann selection in the next unit when we discuss how to choose for survival between the offspring and the parents.

Elitism simply takes the best percent of the current population and always carries them over to the next generation. This guarantees good convergence, but can also lead to poor diversity if the elitism proportion is large. Elitism is usually paired with random selection or any of the other techniques to get a mixture of good diversity and convergence.

Lastly, Hall of Fame is a mechanism where the best individual of the population is set aside in the Hall of Fame, where after the algorithm has stopped, the best solution from this hall of fame is chosen. By doing this, it allows the algorithm to favor good exploration with random selection without the fear of losing the best individual. Note however that the difference between elitism and hall of fame is that elitism takes the best x percentage of the population and carries them over into the next generation; hall of fame on the other hand, only takes the single best individual and places them aside, therefore leaving the possibility of it not being chosen for survival, allowing for good exploration while also not techniquely losing the best solution per generation as it is stored in the Hall of Fame.

Note that all of these methods are based off probability of selection, therefore an individual can be selected multiple times for survival and reproduction.

Reproduction

Now that we’ve selected which individuals will reproduce, we need a way to share genetic material between the parents and introduce more. In EA’s this is performed through Mutation and Crossover. Crossover is how the genome of two parents are mixed together to create the offspring. It works by selecting a crossover point and then crossing over the genome between the parents to create two new offspring. We can see this by the following example:

https://commons.wikimedia.org/wiki/File:Computational.science.Genetic.algorithm.Crossover.Cut.and.Splice.svg
https://commons.wikimedia.org/wiki/File:Computational.science.Genetic.algorithm.Crossover.Cut.and.Splice.svg

Mutation on the other hand, introduces new genetic material by mutating one, or multiple, genes in the genome of the offspring by randomly flipping bits. In practice, mutation is mainly used on poor individuals to encourage exploration while not mutating much on good solutions to ensure good convergence.

Stopping Conditions

Now that we’ve defined our reproduction operators, we simply repeat the entire process of selection and reproduction until the stopping conditions are met. Here are the four most common types of stopping criterions:

  • No Change in Best Fitness Value
  • No Change in Mean Fitness Value
  • Acceptable Solution
  • Maximum Generations Reached

Common stopping criterions used are when there is no change in best fitness value after a certain number of generations. However, this could have problems where elitism is not used or the best solution is not chosen per generation, as the best solution value will fluctuate. In addition, if elitism is used, it can be shown that EA’s can work for long periods of generations in stagnation before finding a better solution.

As with stopping when there is no change in the best fitness value, one could instead stop when there is no change in mean fitness value, indicating a population has converged about an area. However, if the selection and reproduction operators encourage exploration over exploitation, the mean fitness value for the population will fluctuate with such great variance that this condition will never be met. On the other, hand one could stop when an acceptable solution has been found, such as with Neural Network training when the MSE is below one, or for classification when the Cross Entropy Log Loss is near zero. However this can lead to premature exiting from the algorithm where a future better possible solution could have been found.

Lastly, the one most commonly found in practice, is just to simulate the algorithm for many generations and terminate when the maximum generation count has been reached.

Difference in Evolutionary Algorithms

There are many different fields of evolutionary computation, and here are the six we will cover in this series and their differences.

· Unit 3) Genetic Algorithms (GAs): Models Genetic Evolution

· Unit 4) Genetic Programming (GP): Subset of GAs but individuals are represented as trees or graphs

· Unit 5) Evolutionary Programming (EP): Focuses on Behavior, evolves the Phenotype of Individuals

· Unit 6) Evolution Strategies (ESs): Evolves Behavior, Phenotype, and Genotype

· Unit 7) Differential Evolution (DE): Uses Distance and Unit vectors, for Reproduction

· Unit 8) Co-Evolution (CoE): Individuals Evolved either through Cooperation or Competition

Conclusion

In summary of this unit, evolutionary algorithms are all built off the following design choices:

· Initial Population

· Fitness Function

· Chromosome Encoding

· Selection Process

· Reproduction Process

· Termination

The importance of an initial population is to ensure good initial diversity, along with choice of population size. The fitness function in most cases will be the optimization problem itself. Next, we need to decided how to encode the chromosome and what the phenotype will be. Next, we need to choose our selection and reproduction operators. With these two, remember the tradeoff between exploitation and exploration. Finally, we need to choose when we will stop our algorithm.

Now that we have went over all the material covered in the initial diagram of the canonical genetic algorithm, lets go back to it:

Image by Author
Image by Author

First, we initialize our population randomly, then we select those who are the best fit to reproduce. We then exchange the alleles between the two parents through crossover, and introduce new genetic material through mutation. After this, we have some type of survival mechanism to determine who survives and who does not (we will cover this later). We then repeat this process of selection, reproduction, and survival until termination where a stopping criterion is reached.

However, there are some other design issues that we did not cover, namely who will survive after reproduction. In simple algorithms, those who are not chosen for mating are those who do not survive, where the offspring replace the parents. However, in some algorithms, survival could be an entire different selection process, either through another selection procedure of the pooled offspring and parents or some type of ‘fight’ between the offspring and parents to see who survives. We will cover these possibilities later on in further units. Because this unit was an introduction evolutionary algorithms, there will be no application section, however in the next section we will do so.

This concludes Unit 2 on an introduction to evolutionary computation, stay tuned for Unit 3 where we will cover Genetic Algorithms:

Unit 3) Genetic Algorithms (Part 1)


Related Articles