Deep Dives

Table of Contents
🌎 What is Global Optimization?
🧾 Problem Statement
🔢 Encoding and Decoding Functions
🧬 Selection, Crossover, and Mutation
∘ Selection
∘ Crossover
∘ Mutation
👨 💻 Genetic Algorithm
🧪 Experiment
∘ Experiment 1: m = 15
∘ Experiment 2: m = 30
📌 Conclusion
🌎 What is Global Optimization?
The "tree" of knowledge of mathematics and related fields does not grow only by putting forth new branches. It also happens, quite often in fact, that branches which were thought to be completely disparate are suddenly seen to be related.
- Michiel Hazewinkel
One of the problems with optimization processes is that the objective function often has many local minima. In some cases, if the objective function is complex enough or the overall model is too big, this is tolerable. For instance, optimizers for loss functions in typical neural network models are not designed to beat the local minima problem. To overcome this, some practitioners use the same model with different initial weights or different parameters of the optimizer in the hope that one model will give slightly better results than the others.
In exceptional cases, local minima are intolerable, and hence global optimizers are highly needed. They are designed to find the global minima of non-differentiable or even unknown underlying objective functions. There are three most common algorithms:
- Genetic Algorithm
- Particle Swarm Optimization
- Simulated Annealing
What intrigued me the most is the fact that these algorithms are all inspired by real-life phenomena. The Genetic Algorithm is a simulation, based on the principles of evolution. Particle Swarm Optimization was first intended for simulating social behavior, as a stylized representation of the movement of organisms in a bird flock or fish school. Simulated Annealing mimics the solidification of a crystal under a slowly decreasing temperature.
In this story, we will focus our attention on the Genetic Algorithm. Let’s import some libraries first.
🧾 Problem Statement

The world as we see it today, with its variety of different creatures, its individuals highly adapted to their environment, with its ecological balance (under the optimistic assumption that there is still one), is the product of a three billion years experiment we call evolution, a process based on sexual and asexual reproduction, natural selection, mutation, and so on. If we look inside, the complexity and adaptability of today’s creatures have been achieved by refining and combining the genetic material over a long period of time.
In biology, we know that genetic information is summarized in the chromosome, a cell component made of protein and a single molecule of DNA. For the Genetic Algorithm, we model the chromosome with a string of zeros and ones, which will also be called individual (we assumed each individual is represented by one chromosome for simplicity). Some individuals will compete and those with good fitness will reproduce. What we mean by having good fitness is having the objective function optimized.
To be concrete, we will introduce a simple problem and build the Genetic Algorithm concurrently. The problem is to find a root of

where x ∈ [1, 3]. Of course, f is known, differentiable, and has one root in the interval [1, 3], hence we should be good if we use ordinary local optimization techniques. However, for learning purposes, we will employ the Genetic Algorithm instead.
First off, let’s create the equivalent maximization problem as

that is, we make f non-positive then look for the maximum value. If the maximum value found is close to zero (with a certain tolerance ε) then x which causes the maximum value is the root in question. If not, then the algorithm has not been able to find the root of f.
🔢 Encoding and Decoding Functions
Unlike conventional optimization algorithms, the Genetic Algorithm is a probabilistic optimization method. Moreover, the Genetic Algorithm’s search space for a function f : X → ℝ is not directly on X, but on the encoded result of X. Suppose we denote this encoded result by S. Before using the Genetic Algorithm, the first thing we have to do is find an encoding function that maps X to S. Then the last thing we do after the optimization is to perform an inverse of this encoding function (decoding function) which maps S to X. The encoding and decoding functions don’t have to be bijective. However, in most cases, it’s easier if the decoding function is injective.
The encoding function that we use is

where

for every x ∈ X. Here, {0, 1}ⁿ is a complete set of strings of length n consists of zeros and ones, binₙ is a function that maps the set {0, 1, …, 2ⁿ⁻¹} to its binary representation of length n, and round is a function for rounding real numbers to the nearest integer. Since x ∈ [1, 3], then a = 1 and b = 3. Note that the encoding function we have is not bijective because of the round operation.
To be honest, the encoding function looks intimidating, at least for me. Luckily, we don’t really need to define the encoding function in the algorithm since we put random values for x anyway. The strategy is not to initialize random values for x ∈ X but for the binary representation s ∈ S instead. Below is the code.
From here, we can define the decoding function as follows

where

for every s ∈ S. Note that this decoding function is injective.
In this story, we will fix n = 20, although you are encouraged to try other values of n and see what happens. With n = 20, the precision performed by the Genetic Algorithm in the interval [1, 3] is up to

Therefore, we cannot use an error tolerance ε which is smaller than 9.54 × 10⁻⁷. In this story, by default, we select ε = 1 × 10⁻⁵, but you may want to change this parameter in the program.
🧬 Selection, Crossover, and Mutation
Basically, the Genetic Algorithm performs the following steps:
- Initialize the string population B₀ = (b₁₀, b₂₀, …, bₘ₀) at random, where each bᵢ₀ is an individual string in {0, 1}ⁿ, m is the number of individuals in the population, and the index 0 indicates the population B₀ as the 0-th generation.
- Perform the selection, crossover, and mutation loop until the stopping criterion is met. In this story, the stopping criterion chosen is when the best fitness of a population meets |f(x)| ≤ ε or the maximum iteration (maximum number of generations) is reached. The maximum iteration by default is 1000, but you can change this parameter in the program.
Selection
Selection is done by taking a sample from the population in the t-th generation for t = 1, 2, …, then from that sample, one individual with the best fitness is selected to continue to enter the population in the (t+1)-st generation. Individuals in the t-th generation who are not selected will die off. In this story, the sample size used is

from the number of individuals in the population. This process is carried out m times so that the number of individuals in the population in each generation is the same.
This kind of selection process is known as Tournament Selection. We use this selection scheme instead of Roulette Wheel Selection to avoid the potential of converging prematurely to a local maximum (although this can be avoided by mutation) because individuals with good fitness fill the population in the next generation too quickly.
Crossover
Each individual in the population of the t-th generation is formed in pairs (if m is odd, then there is one individual who does not have a partner). Then for each individual pair and a certain probability, an index is randomly selected from {1, 2, …, n−1}. In this story, the probability is

which means a crossover must occur. After that, each bit component after that chosen index in both individuals is exchanged.
This crossover process is called One-point Crossover.
Mutation
Mutation is doing an inversion (from 0 to 1 or from 1 to 0) for each bit in each individual in the population of the t-th generation. The inversion is performed with a certain small probability. In this story, the probability

is used. The pₘ value is set small to avoid chaos during the Genetic Algorithm. The mutation process is carried out to avoid convergence to a local maximum.
For sure, you can change

in the program, if you want.
👨 💻 Genetic Algorithm
Before jumping into the algorithm, let’s create a python function print_result
to display the population, fitness, and average fitness for the first and last generations. It also displays the best fitness as well as point x where the best fitness is achieved.
Now we’re ready. The Genetic Algorithm is built on top of the functions we’ve created so far. Roughly, here’s the algorithm:
Initialize a random population
While the stopping criterion is not met, do:
Selection
Crossover
Mutation
Decode population
Please find below the complete algorithm. Note that we use parameter random_state
to ensure reproducibility.
Next, to display the result beautifully, we create another python function called plot_result
which will display:
- The plot of −|f| as well as the position of the last population on the plot (after decoded)
-
The best fitness for each generation
🧪 Experiment
In addition to depending on the randomness, iteration convergence also depends on the length of the string n and the number of individuals in the population m. As mentioned before, we fix n = 20. Now, we will play with the value of m. We try two choices of m as follows.
Experiment 1: m = 15
====================================================================
Generation 1 max fitness -0.0174 at x = 1.9165
# 1 [0 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 1 0 0 1] fitness: -0.0174
# 2 [1 1 0 1 0 0 1 1 0 0 1 1 0 1 0 1 1 1 0 0] fitness: -0.8531
# 3 [1 1 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 1 0 1] fitness: -0.8342
# 4 [1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 1] fitness: -0.4428
# 5 [1 1 1 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 1] fitness: -1.0461
# 6 [1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 0] fitness: -1.1612
# 7 [0 1 0 1 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0] fitness: -0.1666
# 8 [0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1] fitness: -0.2122
# 9 [0 1 0 1 1 0 0 1 1 1 1 0 0 1 0 1 1 1 1 1] fitness: -0.1402
# 10 [0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 1 1 0 1 0] fitness: -0.3417
# 11 [0 1 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0] fitness: -0.1384
# 12 [0 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0] fitness: -0.2935
# 13 [0 1 0 1 1 1 1 0 0 1 0 1 1 0 0 0 1 1 0 0] fitness: -0.1177
# 14 [1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 1 0 0] fitness: -0.9482
# 15 [0 1 1 0 1 1 1 1 1 0 0 0 0 1 1 0 0 0 1 1] fitness: -0.0196
Average fitness: -0.4489
====================================================================
====================================================================
Generation 27 max fitness -0.0000 at x = 1.8955
# 1 [0 1 1 1 0 0 1 0 1 0 0 1 1 0 1 0 0 0 0 1] fitness: -0.0001
# 2 [0 1 1 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 1] fitness: -0.0977
# 3 [0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1] fitness: -0.0319
# 4 [0 1 1 1 0 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0] fitness: -0.0017
# 5 [0 0 1 1 0 0 1 0 0 0 1 1 1 0 1 1 0 0 0 1] fitness: -0.2879
# 6 [0 1 1 1 0 0 0 0 0 1 1 1 1 0 1 1 0 0 1 0] fitness: -0.0136
# 7 [0 1 1 1 0 0 0 0 1 0 1 1 1 1 1 1 0 1 0 1] fitness: -0.0119
# 8 [0 1 1 1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1] fitness: -0.0008
# 9 [1 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1 1 0 0 0] fitness: -1.1996
# 10 [0 1 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0] fitness: -0.0874
# 11 [1 1 1 1 0 1 1 0 0 1 1 1 1 0 1 1 0 1 1 0] fitness: -1.2485
# 12 [0 1 1 1 0 0 1 0 0 1 1 1 0 0 0 1 1 1 0 0] fitness: -0.0011
# 13 [0 1 1 1 0 0 1 0 1 0 0 1 1 1 1 1 1 0 1 0] fitness: -0.0000
# 14 [0 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1] fitness: -0.2923
# 15 [0 1 1 1 0 0 1 0 0 1 1 1 0 0 1 1 1 1 0 1] fitness: -0.0011
Average fitness: -0.2184
====================================================================
Solution found at iteration 27

Experiment 2: m = 30
====================================================================
Generation 1 max fitness -0.0174 at x = 1.9165
# 1 [0 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 1 0 0 1] fitness: -0.0174
# 2 [1 1 0 1 0 0 1 1 0 0 1 1 0 1 0 1 1 1 0 0] fitness: -0.8531
# 3 [1 1 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 1 0 1] fitness: -0.8342
# 4 [1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 1] fitness: -0.4428
# 5 [1 1 1 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 1] fitness: -1.0461
# 6 [1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 0] fitness: -1.1612
# 7 [0 1 0 1 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0] fitness: -0.1666
# 8 [0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1] fitness: -0.2122
# 9 [0 1 0 1 1 0 0 1 1 1 1 0 0 1 0 1 1 1 1 1] fitness: -0.1402
# 10 [0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 1 1 0 1 0] fitness: -0.3417
# 11 [0 1 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0] fitness: -0.1384
# 12 [0 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0] fitness: -0.2935
# 13 [0 1 0 1 1 1 1 0 0 1 0 1 1 0 0 0 1 1 0 0] fitness: -0.1177
# 14 [1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 1 0 0] fitness: -0.9482
# 15 [0 1 1 0 1 1 1 1 1 0 0 0 0 1 1 0 0 0 1 1] fitness: -0.0196
# 16 [1 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 1] fitness: -0.4489
# 17 [1 0 0 1 1 1 0 0 0 0 0 1 1 0 0 1 0 0 0 1] fitness: -0.3129
# 18 [0 0 1 1 0 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0] fitness: -0.2926
# 19 [1 1 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 1 1 0] fitness: -0.9076
# 20 [0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 0 1 0 1 0] fitness: -0.2801
# 21 [1 1 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 1 1 0] fitness: -0.8456
# 22 [1 1 0 0 0 1 1 0 1 1 0 1 0 0 0 0 0 0 1 1] fitness: -0.7216
# 23 [0 1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 1 1 0] fitness: -0.0499
# 24 [0 1 0 0 1 0 0 1 1 1 1 1 1 0 1 1 1 0 1 0] fitness: -0.2110
# 25 [0 0 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1] fitness: -0.3271
# 26 [0 0 1 1 0 0 1 1 0 1 1 1 1 1 1 1 1 0 1 0] fitness: -0.2847
# 27 [1 1 0 0 1 1 1 0 1 1 0 0 0 1 0 0 1 1 0 1] fitness: -0.8054
# 28 [1 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 1 0 1 0] fitness: -0.7186
# 29 [1 0 0 1 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 1] fitness: -0.2531
# 30 [0 1 1 1 1 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0] fitness: -0.0454
Average fitness: -0.4413
====================================================================
====================================================================
Generation 11 max fitness -0.0000 at x = 1.8955
# 1 [0 1 1 1 0 0 1 0 1 0 0 1 0 1 1 1 1 0 0 1] fitness: -0.0002
# 2 [0 1 1 1 0 0 1 0 1 0 0 1 1 1 1 1 1 0 1 0] fitness: -0.0000
# 3 [0 1 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 1] fitness: -0.0536
# 4 [1 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 1 1 0 1] fitness: -1.1807
# 5 [0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 1 1 1 0 1] fitness: -0.0528
# 6 [0 1 1 1 1 0 1 0 1 0 0 1 0 1 1 0 1 1 1 1] fitness: -0.0528
# 7 [0 0 1 1 0 0 1 0 1 0 0 1 0 1 1 1 1 1 0 0] fitness: -0.2870
# 8 [0 0 0 1 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1 0] fitness: -0.3394
# 9 [1 1 1 1 0 0 0 0 0 1 0 1 0 1 1 1 1 1 0 1] fitness: -1.1780
# 10 [0 1 1 1 0 0 1 0 1 0 0 1 0 1 1 1 1 1 0 0] fitness: -0.0002
# 11 [0 1 1 1 0 0 1 0 1 1 0 1 1 1 0 1 1 1 0 0] fitness: -0.0016
# 12 [0 1 1 1 0 1 1 0 0 1 0 1 0 1 1 0 1 1 1 0] fitness: -0.0242
# 13 [0 1 1 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0] fitness: -0.0495
# 14 [0 1 1 1 0 0 1 0 0 0 0 1 1 1 0 1 1 1 1 0] fitness: -0.0032
# 15 [0 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 1 0 0] fitness: -0.2763
# 16 [0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 1 1 1 1 1] fitness: -0.2665
# 17 [0 0 1 1 1 1 1 0 1 0 0 1 1 0 0 0 1 0 1 0] fitness: -0.2521
# 18 [0 1 0 1 0 0 0 1 1 0 0 1 0 1 1 0 1 0 0 0] fitness: -0.1791
# 19 [0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 0 0] fitness: -0.0003
# 20 [0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 0] fitness: -0.0006
# 21 [0 1 1 1 0 0 1 0 1 0 0 1 1 1 0 1 1 1 0 1] fitness: -0.0000
# 22 [0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 0] fitness: -0.1758
# 23 [0 1 1 1 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 0] fitness: -0.0014
# 24 [0 1 1 1 0 0 1 0 1 0 0 1 0 1 1 1 1 1 0 1] fitness: -0.0002
# 25 [0 1 1 1 0 0 1 0 0 1 0 1 1 1 1 0 1 1 1 1] fitness: -0.0016
# 26 [1 1 0 1 0 0 1 1 0 0 0 1 1 0 0 1 1 1 0 0] fitness: -0.8519
# 27 [0 1 1 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 0 1] fitness: -0.0530
# 28 [0 1 1 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 1 0] fitness: -0.0536
# 29 [0 1 1 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1 1 0] fitness: -0.0002
# 30 [0 1 1 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 0] fitness: -0.0002
Average fitness: -0.1779
====================================================================
Solution found at iteration 11

It can be seen from the above results that for both cases, the program can find the root of

in [1, 3], i.e. x ≈ 1.8955. From the predicted population plot on the magenta graph of f above, it can be seen that the population has gathered around −|f(x)| ≈ 0 for both cases.
Nonetheless, this solution was found in a different number of iterations:
- In the case of m = 15, the solution is found after 27 iterations, which means the program has performed 27 × 15 = 405 evaluations. Since the number of search spaces is 2¹⁵, the program only needs 405 / 2¹⁵ = 1.24% of the total search space.
- In the case of m = 30, the solution is found after 11 iterations, which means the program has performed 11 × 30 = 330 evaluations. Since the number of search spaces is 2³⁰, the program only needs 330 / 2³⁰ = 3.07 × 10⁻⁷ part of the total search space.
The number of generations needed for the case of m = 30 is smaller than that in the case of m = 15, which makes sense since the population is bigger in the former case. All in all, the Genetic Algorithm is much more effective than finding a solution x in a search space randomly, especially in the case of m = 30.
Finally, from the plot of the best fitness vs. iteration above (colored in cyan), we can see that in general, each generation will have individuals with better fitness than the previous generation. However, there may be certain generations who have individuals with worse fitness than the preceding generation. This is a natural thing due to the mutation process.
📌 Conclusion
Genetic Algorithm is a powerful global optimization technique that eradicates the local trap if applied with the right settings. It’s completely probabilistic and the result depends on the randomness of the process, the length of the chromosomes in individuals, and the number of individuals in the population. We’ve seen the Genetic Algorithm in action to search for a root of a function. It’s observed that the Genetic Algorithm is much more effective than finding the root randomly in the search space.

🔥 Hi there! If you enjoy this story and want to support me as a writer, consider becoming a member. For only $5 a month, you’ll get unlimited access to all stories on Medium. If you sign up using my link, I’ll earn a small commission.
🔖 Want to know more about how classical Machine Learning models work and how they optimize their parameters? Or an example of MLOps megaprojects? What about cherry-picked top-notch articles of all time? Continue reading:
