Constrained Optimization demystified, with implementation in Python.

Designing a new robust constrained optimization algorithm.

Published in

Towards Data Science

10 min readFeb 1, 2020

Nonlinear constrained optimization problems are an important class of problems with a broad range of engineering, and scientific applications. In this article, we will see how the refashioning of simple unconstrained Optimization techniques leads to a hybrid algorithm for constrained optimization problems. Later, we will observe the robustness of the algorithm through a detailed analysis of a problem set and monitor the performance of optima by comparing the results with some of the inbuilt functions in python.

Keywords — Constrained-Optimization, multi-variable optimization, single variable optimization.

Many engineering design and decision making problems have an objective of optimizing a function and simultaneously have a requirement for satisfying some constraints arising due to space, strength, or stability considerations. So, Constrained optimization refers to the process of optimizing an objective function with respect to some variables in the presence of constraint of those variables.

A constrained optimization problem with N variables is given by:

-where gⱼ(x) are the J inequality constraints, hₖ(x) are the K equality constraints, f(x) is the objective function to be optimized. Let us understand some of the frequently used terminologies in optimization.

THEORY

In the parlance of mathematical optimization, there are two routes by which one can find the optimum(Numerically):

1. Using Direct Search methods: Here, we only use the function values at a given point to find the optimum. It works by comparing the function values in a neighborhood of a point and subsequently moving in the direction that leads to a decrease in the function value (for minimization problems). Direct Search methods are typically used when the function is discontinuous, and hence the derivate is not available at that point.

2. Using gradient-based methods: Here, we use the first and second-order derivatives to locate the optima. These methods take the gradient information into account and thus have the advantage of converging faster to the optima.

How to find the derivative at a particular point numerically? We use the central difference method, which is mathematically given as {limit h ->0}:

Our proposed algorithm for constraint optimization hires two single variable optimization methods and one multi-variable optimization method. Our main intention is to convert this multivariable constraint optimization problem into an unconstraint multi-variable optimization problem, and this unconstraint problem then can be solved using the single variable optimization methods.

Single Variable Optimization

Again, there are two routes for finding the optimum of a linear or non-linear function of a single variable, one is using direct search methods, and the other is through the gradient-based techniques. One can find the optima using solely either of the approaches. Our algorithm for constraint optimization uses both approaches. Using the direct search method, we will bracket the optima, and once we have a particular bound for the optima, we can find the exact optima using the gradient-based method (for single variable function).

There are many direct search and gradient-based methods for obtaining the optimum of a single variable function. Our method uses the Bounding Phase Method and the Secant method.

Note: all the optimization methods described are iterative. We converge to the optimum value gradually after a series of iterations.

Bounding Phase Method: A direct search method to find lower and upper bounds for the minima in a single variable unconstrained optimization. The Algorithm is given as (f’ refers to the 1ˢᵗ order derivative at a point):

Choose initial guess x⁽⁰⁾, an increment Δ(~0), set iteration counter k=0.
If f(x⁽⁰⁾ -Δ) > f(x⁽⁰⁾ +Δ), then Δ is positive. Else if, f(x⁽⁰⁾ -Δ) < f(x⁽⁰⁾ +Δ), then, Δ is negative. Else go to step_1.
Set x⁽ᵏ⁺¹⁾ = x⁽ᵏ⁾ + 2ᵏΔ. (exponential perturbation).
If f(x⁽ᵏ⁺¹⁾) < f(x⁽ᵏ⁾), set k = k+1 and go to step_3. Else, the minima lies in (x⁽ᵏ⁻¹⁾, x⁽ᵏ⁺¹⁾) and terminate.

Secant Method: A very popular gradient-based method for a single variable optimization. The termination condition is when the gradient of a function is very small (~0) at a point. The method is as follows:

Choose two points a, b such that f’(a) and f’(b) have opposite signs. In other words, f’(a).f’(b) < 0. Choose ε{epsilon} a small number, aka the termination factor. Set x₁= a and x₂ = b.
Compute a new point z = x₂- (f’(x₂)*(x₂-x₁))/(f’(x₂)-f’(x₁)) and find f’(z).
If |f’(z)| ≤ ε, Terminate.
Else if f’(z) < 0, Set x₁= z and go to step_2,
Else if f’(z) ≥ 0, Set x₂ = z and go to step_2.

Other popular gradient-based methods for single variable optimization are the Bisection method, Newton-Rapson method, etc.

Unidirectional Search: Here, the goal is to find where the function value will be minimum in a particular direction. Mathematically, we need to find scalar α(alpha) such that, f(α) = f(x⁽ᵗ⁾+α.s⁽ᵗ⁾) is minimized, which is achieved using the single variable optimization methods. {s⁽ᵗ⁾ = search direction}.

Note: Many multi-variable optimization techniques are nothing but successive unidirectional search, to find the minimum point along a particular direction.

Variable Metric Method(Davidon–Fletcher–Powell Method):

The DFP method is a gradient-based multi-variable optimization algorithm. It results in a faster convergence to the optima by not taking into account the hessian for creating a search direction, thereby overcoming the limitations of several other multi-variable optimization algorithms. The search direction is given by:

where the matrix A is given by:

e(x⁽ᵏ⁾) represents the gradient of the function at a point x⁽ᵏ⁾. The uni-direction search involves the secant method and the bounding phase method to find the value of α in the search space. The new point obtained after the unidirectional search is :

Choose initial guess x⁽⁰⁾ and termination parameters ε₁, ε₂. (Note, here x⁽⁰⁾ is a vector).
Find ∇f(x⁽⁰⁾) {gradient of f at x⁽⁰⁾} and set s⁽⁰⁾ = -∇f(x⁽⁰⁾) {initial search direction}.
Find α(alpha) such that f(x⁽⁰⁾ + α.s⁽⁰⁾) is minimum with the termination parameter ε₁. Denote it by α*. Set x⁽¹⁾ = x⁽⁰⁾ + α*s⁽⁰⁾ and k=1. Calculate ∇f(x⁽¹⁾). {f(x⁽⁰⁾ + α.s⁽⁰⁾) is a function of α, and we will find this α* by single variable optimization methods}.
Find s⁽ᵏ⁾ using the formula in eq(1) and eq(2).
Find λ⁽ᵏ⁾ such that f(x⁽ᵏ⁾ +λ⁽ᵏ⁾.s⁽ᵏ⁾) is minimum with terminating factor ε₁. Set, x⁽ᵏ⁺¹⁾ = x⁽ᵏ⁾ +λ⁽ᵏ⁾.s⁽ᵏ⁾.
Check for the termination conditions. Is ||x⁽ᵏ⁺¹⁾ -x⁽ᵏ⁾||/||x⁽ᵏ⁾|| ≤ ε₂?
{|| . || denotes the norm of a vector}.
If Yes, TERMINATE; Else set k = k+1 and go to step_4.

Penalty Function Method:

A penalty(regularizer) is an additional term we add to our objective function, which helps in controlling the excessive fluctuation of that objective function. By adding these penalty terms, we transform our constrained problem to an unconstrained problem structured such that minimization favors satisfaction of the constraints, as shown in the figure below.

Simply put, the technique is to add a term to the objective function such that it produces a high cost for violation of constraints. This is known as the Penalty function method. Mathematically,

where R is a penalty parameter, P(x, R) is the penalty function, and Ω is the penalty term. There are various types of penalty terms depending upon the feasibility and type of constraint. This one is known as the bracket operator penalty term. where,

The method is given as:

Choose initial solution x⁽⁰⁾ and termination parameters ε₁, ε₂, penalty parameter R⁽⁰⁾, and an update factor c.
Form the penalty function P(x⁽ᵗ⁾,R⁽ᵗ⁾) = f(x⁽ᵗ⁾) + Ω(R⁽ᵗ⁾, g(x⁽ᵗ⁾), h(x⁽ᵗ⁾)).
Use the DFP method to find the minimum of P(x⁽ᵗ⁾, R⁽ᵗ⁾). Let the solution be x⁽ᵗ⁺¹⁾. {This particular step is an unconstraint optimization problem}. For DFP, our initial guess will be x⁽ᵗ⁾.
Check the termination condition:
Is |P(x⁽ᵗ⁺¹⁾, R⁽ᵗ⁾) - P(x⁽ᵗ⁾, R⁽ᵗ⁻¹⁾)| ≤ ε₂? {|.| is the mod function}.
If yes, Set xᵀ= x⁽ᵗ⁺¹⁾ and Terminate,
Else go to step_5.
Update R⁽ᵗ⁺¹⁾= c*R⁽ᵗ⁾. Set, t=t+1 and go to step_2.

Note: all the optimization methods described are iterative. We converge to the optimum value gradually after a series of iterations. We update this R-value at each iteration.

There are several limitations to using the penalty function method. Firstly, It results in a distortion of the contours, due to which the algorithm takes a greater time to converge. Also, this results in the presence of artificial local optimas.

For the purpose of implementation, we will only stick to the penalty function method. There is another method known as the Method of Multipliers, used to overcome the limitations of distortion. It is basically a slight modification of the Penalty function method. This method does not distort the contours but instead has an effect of shifting the contours towards the constraint optimum point. So, here the presence of artificial local optima will be zero.

The method of multiplier and penalty function method both will convert a constrained optimization problem to an unconstrained problem, that further can be solved by any multi-variable optimization method.

Well, that’s it!!! If you have come this far, great! Now, let us have a look at the flow chart of our method and then go for the implementation. For the sake of implementation, we will only cover the penalty function method.

The Himmelblau Function:

For the purpose of illustration of our method, we will work with the famous Himmelblau function(see fig.), given as f(x, y)=(x²+y−11)² +(x+y²−7)².

Let’s solve the following constraint optimization problem using our proposed algorithm.

IMPLEMENTATION

DFP method is used for multi-variable optimization, and a combination of the bounding phase and the secant method is used to obtain a uni-directional search. The code is implemented in python and hosted in my GitHub repo. We now briefly demonstrate each of the functions used:

multi_f: This function takes an input vector x (a point in search space) and returns the function value (penalized function value) at that point.

grad_multi_f: This function uses the central difference method to calculate the gradient vector at a particular point in search space.

bracketing_: This function implements the bounding phase method used to bracket the α*(minima obtained by performing the uni-directional search). It takes a vector x and vector s(search direction) and outputs an interval on the basis of which α can be evaluated.

f _dash: This function is used to get the first-order differential for a single variable function using the central difference method. (represents f’).

secant_minima: This function takes the bounds found from the bounding phase method, a point x, and a search direction s as input and evaluates the alphastar.

compute_z: This function is used to compute the formula used in the secant method:

DFP: It takes only the input vector x as an argument and returns the solution vector. This function is called inside the main function until the termination conditions for the penalty function method are met.

It starts by finding a search direction s from the input vector x, then performs a unidirectional search by calling the bounding phase and the secant method to find the optimum α. It then finds new search directions by evaluating the equations (1) and (2). The process continues until the termination condition for DFP is met, and we have found the optimum solution of this unconstrained problem for this particular sequence.

RESULTS

The full code is on my Github repo. Note that our method is generalized and applicable to any number of dimensions one wants to work on.
The parameter setting for our algorithm is:
* M=2 {specifies the total dimensions we are working with},
* R=0.1 {panalty parameter} ,
* c=1.55 {factor for updating R},
* x_ip (initial guess)=(0.11, 0.1)ᵀ.

I suggest the reader to try using different initial guesses and play with these parameter values. So, our algorithm converged after 14 sequences. And we get the optimum solution to the constrained optimization problem.

Let’s compare our results with those been found from the optimize module of the scipy library in python.{Working with the same initial guess}:

The results are quite close enough, to get even closer results, we can try very small terminating factors. I also tried a range of different initial guesses. And yes, they all converged!!

“Premature optimization is the root of all evil” — Donald Knuth.

I hope you enjoyed the ride through constraint optimization along with me. I would love to know the feedback of anyone reading this article (A clap 👏🏼 will also be good feedback 😇). I would be happy to answer doubts/questions on any of the concepts mentioned above. You can reach me via Linkedin.

Finally, a special thanks to Prof. Deepak Sharma, IIT Guwahati, who taught me the optimization course as a part of my curriculum.

THANK YOU!!