EVOLUTIONARY COMPUTATION COURSE
Hello and Welcome back to this full Course on Evolutionary Computation! In this post we will start with Unit 1 of the course, Optimization Theory. In the previous post we covered the basic overview of the course, you can check it out here:
To start off, what exactly is optimization theory? Optimization Theory is a branch of mathematics devoted to solving optimization problems. Optimization problems are mathematical functions where we want to minimize or maximize the function value. These types of problems are found numerously in computer science and applied mathematics. Finding good solutions to these problems is what gives data science and Machine Learning models good accuracy, as they themselves are built off numerical methods used for finding good solutions to minimize error functions. Examples of common optimization problems in Machine Learning are minimizing MSE, MAE, Cross-Entropy, etc.
Table of Contents
- Overview
- Unconstrained Problems
- Constrained Problems
- Multi-Solution Problems
- Multi-Objective Problems
- Benchmarking Test Functions
- Conclusion
Overview
In a brief overview, optimization theory can be reduced down to three questions:
- What is the Objective Function?
- What are the Set of Variables of Interest?
- What are the Set of Constraints?
Knowing how your objective function looks like in terms of number of dimensions, domain, and local extrema count is vital in choosing an algorithm. In addition, it is necessary to understand the data type for each of the variables: discrete or continuous? Lastly, knowing the set of constraints is important to limit your algorithm in searching the domain space for only feasible solutions.
If you remember from calculus, optima, otherwise known as critical points are points in the domain space such that the slope of the function at that point is zero. These points are commonly referred to as extrema, classified as either minima or maxima. Extrema can be classified into three four main types:
- Weak Extrema
- Strong Extrema
- Global Extrema
- Saddle Points

Weak extrema are critical points such that the neighboring points around has the same function value, creating a plateau (as can be seen in the picture above). Strong Extrema are critical points such that it is has either the largest (for maximization) or smallest (for minimization) function value from the neighboring points, creating a valley or mountain. Global Extrema are Strong Extrema points such that its function value is either the largest (for maximization) or smallest (for minimization) globally for all Strong Extrema points. Lastly, Saddle Points, we won’t deal much with these in our course but they do occur. Saddle Points represent inflexion points between concave up and concave down trends. Because we won’t be dealing with Saddle Points, I won’t cover much of their intuition.
There are four main types of optimization problems, Unconstrained, Constrained, Multi-Solution, and Multi-Objective.
Unconstrained Problems
Unconstrained Problems are just like they sound, unconstrained. These are functions where we want to minimize or maximize the function given a set of variables whose domain is either continuous, discrete, or both. Here we have a definition:

Constrained Problems
Constrained Problems are just like Unconstrained Problems in terms of set up but have unique constrained of the input space. Here we have a definition:

As an illustrative example, here below we have an example of these components. First, we have the objective function denoted by A. Then we have the global unconstrained minimum at the bottom right at point E. However, we introduce an inequality constraint denoted by Line D where any value in the checkered region is infeasible. Thus, we are left with a part of the domain to work with. Point B and C are two critical points in the feasible domain space. However, the next global minimum in the constrained environment would be at point F. Thus, in scenarios such as these we would like our algorithm to steer away from the infeasible domain space and find the next best minimum value.

Here below we have an example of the interaction of the domain space between two variables in a constrained environment. Here we can see that X and Y are continuous variables which can range between 0 and 2. However, we add a constraint such that as X tends towards 2, Y decreases; but if Y tends towards 2, X starts to decrease. This can be seen by the blue semi-ring, which represents the feasible input space and the transparent background represents the infeasible domain space.

Multi-Solution Problems
Multi-Solution Problems are functions where there are no unique global extrema; instead there are tied global solutions. In these instances we want to return all the tied global solutions (hence the name multi-solution) to give us a range of options to choose from. In a real application, we might want to minimize some type of cost function, where we’ve found that there exists multiple solutions. Our goal is to return these multiple solutions so that we have the ability to choose which point or vector of variable values best fits our interest. Here we have an exact definition:

Here below we have an example of a multi solution type problem. It is a variation of the sin wave function. As we can see, we have tied global minima and maxima for this problem. Note However that not all multi solution problems occur periodically like a sin wave, where it is easy to predict the next value, these tied global extrema will most likely spuriously throughout the domain space and without pattern.

Multi-Objective Problems
Multi-Objective Problems (MOP) are problems where we have many different objectives, or optimization problems, that we need to solve simultaneously. There are two main methods for solving these types of problems:
- Weighted Aggregation
- Pareto Optimality
Weighted Aggregation is simply an aggregate of all the objective functions. We simply sum up each objective function, multiplied by an associated weight value, and try to minimize or maximize that sum. It is usually assumed that the sum of the weights are equal to one. Here we have an example aggregate function we want to simultaneously minimize F1 and F2. Note that each Function has an associated weight value, where larger weight values indicate which function has the most influence.

Here is an exact definition of weighted aggregation:

Pareto Optimality is another way to solve MOP by classifying possible solutions in a dominance structure. Domination is a way to distinguish good solutions from bad. There are two main types of domination: Strong and Weak. Strong domination occurs when a possible solution is equal to or better than another possible solution in all objectives, and is better than in at least one objective. Here is an exact definition from the textbook:

Weak domination on the other hand states that a possible solution is not worse than another solution in all objectives. Here is an exact definition:

For an example of both types of domination, here we have a figure from the textbook with some overplayed numbers that I added:

On the x axis we have the F1 value and on the y axis we have the F2 value. In this scenario we want to minimize both F1 and F2, so the absolute best solution occurs at the origin when F1 == F2 == 0. Point F is denoted in the lower middle left by the red circle. First we have all the points that are strongly dominated by F, by Section B. These points are strongly dominated by F as they have worse values in F1 AND in F2. On the other hand, sections A and D are only weakly dominated by F as they have better values in at least one objective. Section A is worse than F in terms of F2 as its function values are larger but better in terms of F1 as its function values are smaller. Section D is the opposite, it is worse than F in terms of F1 but better in terms of F2. Lastly, we have Section C, which dominates Sections 1, 2, 4 and Point F as this section is better than all the other functions in at least one objective while being equal or better than in the rest.
The whole purpose of introducing this notion of domination is to find the Pareto-Front. The Pareto-Front a set of decision vectors such that each vector weakly dominates each other but strongly dominates all other vectors in the input space. Here we have an example of a Pareto-Front for two objectives:

As we can see, the minimum occurs when both F1 == F2 == 0; however, the problem is that the minimum point for F1 is probably not the same for F2. Therefore, we get a range of values to choose from. As stated before, the Pareto Front weakly dominates itself while strongly dominating the rest of the domain space; in this way, returning this front gives the user the ability to choose solutions which best fits their needs. For example, if F1 is more important to minimize that F2 then you could choose the weakly dominated solution that favors F1. Here is another example of a Pareto Front for a discrete problem:

The reason I’m including this picture above is to show how the Pareto-Front is not always a nice convex curve as before, but can be sporadic and hard to predict.
Benchmarking Test Functions
Benchmarking test functions are designed mathematical optimization problems for testing algorithms. These are artificially designed problems with no application, but designed in such a way that they will ‘as difficult’ or more ‘difficult’ than real world application. In literature, instead of comparing algorithms based upon certain applications which may require extensive domain knowledge concerning the subject; researchers often compare functions by how well they can find global extrema in these benchmarking test functions. There are many different types of benchmarking test functions, here are three test functions that are commonly found in literature:



Later on in the course we will be testing our algorithms on test functions such as these.
Conclusion
In summary, optimization problems can be classified into four different categories. Unconstrained problems where the bounds are fixed such that they do not change. Constrained problems where the domain changed dependent upon variable values. Multi solution problems where there are no unique global minima/maxima and we want to return all values. Lastly, multi objective where we want to find the pareto front.
Optimization problems are extremely abundant in numerous fields of research, whether it be from traffic scheduling, maximizing production based off different variables, efficient electricity use to neural network training, game modeling, or optimizing data science models. The goal in these circumstances is to minimize or maximize the optimization function, where each optima represents a possible solution. However, we do not want just a possible solution, we want the best solution, the global minima or maxima. So how do we find these global extrema points? We could use standard numerical methods such as Newtons method or conjugate gradient; or perhaps we could use other optimization methods such as leapfrog, hill climber, or simulated annealing. In this series we will cover using evolutionary computation as a means to find these global extrema.

Why would we want to use evolutionary computation in first place over classical optimization techniques? Well the problem with classical optimization algorithms are usually that they are deterministic, meaning you feed it the same initial point and it will always give you same output, it is also sequential, meaning the algorithm can only work using one point at a time, and it could require differential information, particularly the first and second order derivatives. On the other hand evolutionary computation is a random search method, meaning if you feed it the same initial point it is not guaranteed that the output will be the same, either for better or for worse, it also uses parallel search by working with a population of initial points rather than one, and it works extremely well when the problem is not differentiable. Because of these reasons, evolutionary computation are commonly used on problems that classified as NP hard problems, such as traveling salesman or scheduling, or when the optimization function is not differentiable.
This concludes Unit 1 on optimization theory, stay tuned for Unit 2 on an Introduction to Evolutionary Computation, a methodology for solving optimization type problems: