
Introduction
Genetic Algorithm (GA) is a type of natural computing algorithm, which are algorithms developed to try to solve problems by replicating phenomena and behaviors present in nature.
There are several natural algorithms in the literature, such as Particle Swarm Optimization (PSO), which tries to mathematically simulate the social behavior of schools of fish or flocks of birds, and Ant Colony Optimization (ACO), which seeks to find the best paths in a graph simulating the behavior of ants when moving.
These types of algorithms are used to solve optimization problems, where you want to find optimal values (maximum or minimum depending on the problem analyzed) within a search space. One of the most studied optimization problems is the so-called _traveling salesman problem_.
Among all Natural Computing algorithms, one of the most famous is the Genetic Algorithm.
Overview

The Genetic Algorithm is based on concepts of genetics, where transformations are applied to data that aim to try to replicate events such as mutation, natural selection, and cross-over.
Briefly, the genetic algorithm is executed iteratively and has the following steps:
- Generation of an initial population
For each iteration:
- Selection of a subpopulation (parents)
- Generation of a new population (children) with the application of cross-over and mutation operations
- Evaluation of the new population generated
- Replacement of the initial population with the new population
The algorithm will run until a stopping condition is reached, usually a maximum number of iterations or a certain value obtained when evaluating the generated population.
It is important to bear in mind that to correctly execute the algorithm, it is essential to have a well-defined problem to be solved and the evaluation metric to be used.
Concepts

To understand the algorithm it is necessary to understand the concepts behind it.
- Individual: a member of the population that represents a single possible solution to the problem. Mathematically, it can be represented as an array of values, where each value in the array is a coefficient linked to the response equation of the analyzed problem.
- Population: a group of individuals who are candidates for the answer to the analyzed problem.
- Tournament: method of selecting the fittest individuals in the population. It consists of comparing the results of two (or more) individuals from the population and selecting the one that obtains the best result.
- Cross-over: operation of recombination of genes, where the genetic code of the parents is combined to generate the child. One (or more) cutoff points are defined, where the child receives all genes from the beginning of the genetic code to the cutoff point from the Parent 1 and all genes from the cutoff point to the end of the genetic code from the Parent 2. The picture below (Figure 1) shows an example of a cross-over operation with one cutoff point, where each cell represents a coefficient linked to the response equation.

- Mutation: operation that modifies the value of a gene to a value different from the original. Figure 2 shows an example of mutation applied to an individual.

Now that the basic concepts are defined, let’s look at an example of a problem that uses the Genetic Algorithm as a way to solve it.
The example

In Machine Learning, a widely used algorithm is the Artificial Neural Networks, which try to simulate the behavior of the human brain for decision-making. A very simple type of network is called Multilayer Perceptron (MLP), which are networks with an input layer, one or more intermediate layers, and an output layer.
The image below (Figure 3) shows an example of an MLP feedforward neural network with an intermediate layer containing four neurons, a hidden layer also with four neurons, and an output layer with one neuron.

The details of how the network works are not the focus of this article, it is enough to know that the output y is calculated through operations that use the weights between the layers of the network and the input data.
The layer weights are sets of vectors, where each value corresponds to the weight associated with a link between neurons in different layers. The purpose will be to optimize these weights so that the network can solve the problem.
Let’s begin by defining the class for the MLP Feedforward described in Figure 3.
Now, let’s go step by step through the functioning of the algorithm. The first step is to create an initial population. The function below receives the number of individuals desired (n_indiv) and returns the population of individuals (weights of the neural network) randomly initialized, using the MLPFeedforward class.
The function below performs the Tournament step of the GA, randomly selecting two parents from the initial population and storing the best parent (the one with the biggest accuracy), out of the two, in the parents list. It receives as parameters the population, and the data: features (X) and labels (y).
The next step is to create the function that will perform the Cross-over and Mutation operations. The parameters of the function are the parents list, the thresholds to decide if the operations of cross-over (cross_thr) and mutation (mutation_thr) will occur, and the mutation_limits, which limits the range in which the genes will change in the operation.
To wrap it all up, a class that has the whole algorithm was implemented. It also has the run method, which executes the algorithm multiple times, storing the accuracy obtained at each run and saving the best MLP generated and its performance.
Now, to show the algorithm, a generic dataset with 4 features, 10.000 rows, and 2 classes was created, split into training and test data (with 70% being training samples and the remaining 30% test samples), and standardized it using the Standard Scaler. Then, the GA_MLPFeedforward class was used to show how the performance of the model increases with the use of the Genetic Algorithm. The parameters used in the algorithm were:
- Size of the population = 50
- Number of iterations = 200
- Cross-over threshold = 0.5
- Mutation threshold = 0.7
- Mutation limits = [-5, 5]
It is important to say that the parameters used were not optimized, and the optimization of the parameters plays a very important role in the outcome of the algorithm.
After the execution of the algorithm, the evolution of the accuracy in the training data was plotted, as shown in Figure 4 below.

Finally, the performance of the model when evaluating the test data is presented in Figure 5.


Hopefully, this article was able to give a quick understanding of the Genetic Algorithm and how it can be used in the development of Machine Learning algorithms. It is always good to learn new skills and ways of doing the same thing.
Any comments and suggestions are more than welcome.
Feel free to reach me on my Linkedin and to check my GitHub.