The world’s leading publication for data science, AI, and ML professionals.

Optimization, Newton’s Method, & Profit Maximization: Part 2- Constrained Optimization Theory

Learn how to extend Newton's Method to and solve constrained optimization problems

All Images by Author
All Images by Author

This article is the 2nd in a 3 part series. In the 1st part, we studied basic Optimization theory. Now, in pt. 2, we will extend this theory to constrained optimization problems. Lastly, in pt. 3, we will apply the optimization theory covered, as well as econometric and economic theory, to solve a profit maximization problem.

Consider the following problem: You want to determine how much money to invest in specific financial instruments to maximize your return on investment. However, the problem of simply maximizing your return on investment is too broad and simple of an optimization question to ask. By virtue of the simplicity, the solution is to just invest all of your money in the financial instrument that shows promise for highest return. Clearly this is not a good investment strategy; so, how can we improve this? By putting constraints on the investment decisions, our choice variables. For example, we can specify constraints that, to name a couple, 1) limit the amount of financial risk we are willing to entertain (see modern portfolio theory) or 2) specify the amount of our portfolio to be allocated towards each category of financial instruments (equity, bonds, derivatives, etc.) – the possibilities are endless. Notice how this problem becomes significantly more tractable as we add constraints. Despite this simple example, it helps to capture a fundamental motivation of constrained optimization:

The essence of constrained optimization is to provide unconstrained optimization problems a sense of tractability and applicability to complex real world problems.

Constrained optimization is defined as "the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables."[1] The process of adding constraints on the variables transforms an unconstrained and, perhaps, intractable optimization problem into one which can help model and solve a real world problem. However, the addition of constraints can turn a simple optimization problem into a problem that is no longer trivial. In this post, we will dive into some of the techniques that we can add to our toolbox to extend the unconstrained optimization theory, learned in part 1 of this series, to now solve constrained optimization problems.

In [part 1](https://medium.com/towards-data-science/optimization-newtons-method-profit-maximization-part-1-basic-optimization-theory-ff7c5f966565), we covered basic optimization theory – including 1) setting up and solving a simple single variable optimization problem analytically, 2) iterative optimization schemes – namely, gradient descent & Newton’s Method, and 3) implementing Newton’s method by hand and in python for a multi-dimensional optimization problem. This article is designed to be accessible for those who are already familiar with the content covered in part 1.


Constrained Optimization Basics (& Part 1 Recap)

A mathematical optimization problem can be formulated abstractly as follows:

(1)
(1)

where we choose real values of the vector x that minimize the objective function f(x) (or maximize –f(x)) subject to the inequality constraints g(x) and equality constraints h(x). In part 1, we discussed how to solve these problems in the absence of g(x) and h(x) and now we will introduce these back into our optimization problem. First, let’s succinctly recap how to implement Newton’s method for unconstrained problems.

Recall that we can approximate the first order necessary condition of a minimum using a Taylor Series expansion:

(2)
(2)

where H(x) and _f(x) denote the Hessian and gradient *o_f f(x), respectively. Each iterative addition of delta, Δ, is an expected better approximation of the optimal values x. Thus, each iterative step using the NM can be represented as follows:

(3) Newton Method Iterative Scheme
(3) Newton Method Iterative Scheme

We do this scheme until we reach convergence across one or more of the following criteria:

(4) Convergence Criteria for Iterative Optimization Schemes
(4) Convergence Criteria for Iterative Optimization Schemes

Putting this into python code, we make use of SymPy – a python library for symbolic mathematics – and create generalizable functions to compute the gradient, compute the Hessian, and implement Newton’s method for an n-dimensional function:

import sympy as sm
import numpy as np

def get_gradient(
    function: sm.core.expr.Expr,
    symbols: list[sm.core.symbol.Symbol],
    x0: dict[sm.core.symbol.Symbol, float],
) -> np.ndarray:
    """
    Calculate the gradient of a function at a given point.

    Args:
        function (sm.core.expr.Expr): The function to calculate the gradient of.
        symbols (list[sm.core.symbol.Symbol]): The symbols representing the variables in the function.
        x0 (dict[sm.core.symbol.Symbol, float]): The point at which to calculate the gradient.

    Returns:
        numpy.ndarray: The gradient of the function at the given point.
    """
    d1 = {}
    gradient = np.array([])

    for i in symbols:
        d1[i] = sm.diff(function, i, 1).evalf(subs=x0)
        gradient = np.append(gradient, d1[i])

    return gradient.astype(np.float64)

def get_hessian(
    function: sm.core.expr.Expr,
    symbols: list[sm.core.symbol.Symbol],
    x0: dict[sm.core.symbol.Symbol, float],
) -> np.ndarray:
    """
    Calculate the Hessian matrix of a function at a given point.

    Args:
    function (sm.core.expr.Expr): The function for which the Hessian matrix is calculated.
    symbols (list[sm.core.symbol.Symbol]): The list of symbols used in the function.
    x0 (dict[sm.core.symbol.Symbol, float]): The point at which the Hessian matrix is evaluated.

    Returns:
    numpy.ndarray: The Hessian matrix of the function at the given point.
    """
    d2 = {}
    hessian = np.array([])

    for i in symbols:
        for j in symbols:
            d2[f"{i}{j}"] = sm.diff(function, i, j).evalf(subs=x0)
            hessian = np.append(hessian, d2[f"{i}{j}"])

    hessian = np.array(np.array_split(hessian, len(symbols)))

    return hessian.astype(np.float64)

def newton_method(
    function: sm.core.expr.Expr,
    symbols: list[sm.core.symbol.Symbol],
    x0: dict[sm.core.symbol.Symbol, float],
    iterations: int = 100,
) -> dict[sm.core.symbol.Symbol, float] or None:
    """
    Perform Newton's method to find the solution to the optimization problem.

    Args:
        function (sm.core.expr.Expr): The objective function to be optimized.
        symbols (list[sm.core.symbol.Symbol]): The symbols used in the objective function.
        x0 (dict[sm.core.symbol.Symbol, float]): The initial values for the symbols.
        iterations (int, optional): The maximum number of iterations. Defaults to 100.

    Returns:
        dict[sm.core.symbol.Symbol, float] or None: The solution to the optimization problem, or None if no solution is found.
    """

    x_star = {}
    x_star[0] = np.array(list(x0.values()))

    # x = [] ## Return x for visual!

    print(f"Starting Values: {x_star[0]}")

    for i in range(iterations):
        # x.append(dict(zip(x0.keys(),x_star[i]))) ## Return x for visual!

        gradient = get_gradient(function, symbols, dict(zip(x0.keys(), x_star[i])))
        hessian = get_hessian(function, symbols, dict(zip(x0.keys(), x_star[i])))

        x_star[i + 1] = x_star[i].T - np.linalg.inv(hessian) @ gradient.T

        if np.linalg.norm(x_star[i + 1] - x_star[i]) < 10e-5:
            solution = dict(zip(x0.keys(), x_star[i + 1]))
            print(f"nConvergence Achieved ({i+1} iterations): Solution = {solution}")
            break
        else:
            solution = None

        print(f"Step {i+1}: {x_star[i+1]}")

    return solution

And to solve an unconstrained optimization problem, we can run the following code:

import sympy as sm

# Define Symbols
x, y = sm.symbols('x y') 
Gamma = [x,y] 

# Define Objective Function (Rosenbrock's Parabolic Valley)
objective = 100*(y-x**2)**2 + (1-x)**2

# Specify starting values
Gamma0 = {x:-1.2,y:1}

# Call function
newton_method(objective, Gamma, Gamma0)

With the corresponding output:

If all of the material reviewed above feels extremely foreign, then I recommend taking a look at part 1 which will dive into everything above in more depth and bring you up to speed! Without further ado, let’s dive into implementing constraints in our optimization problems.

Note: All of the following constrained optimization techniques can and should be incorporated w/ gradient descent algorithms when applicable!


Solving Constrained Optimization Problems

As we discussed above there are two possible constraints on an objective function – equality and inequality constraints. Note that there are varying methodologies out there for dealing with each type of constraint with varying pros and cons. See [2] for a further discussion of different methodologies. Nevertheless, we will home our focus in on two methodologies, one for equality and one for inequality constraints, that I believe are robust in their performance, easy to grasp for newcomers, and easily integrated together into one cohesive problem.

Equality Constraints – The Lagrangian

First, we will address optimization problems with equality constraints in our optimization problem. That is, optimization problems that take the form:

(5)
(5)

Suppose we are working with the Rosenbrock’s Parabolic Valley, as in part 1, but now with the equality constraint that x² – y = 2:

(6) Rosenbrock's Parabolic Valley w/ Equality Constraint (Problem 1)
(6) Rosenbrock’s Parabolic Valley w/ Equality Constraint (Problem 1)

Note that, for simplicity and consistency, the equality constraints should be written such that they are equal to zero. Now our optimization problem looks like:

Rosenbrock's Parabolic Valley (purple-yellow color map) & Equality Constraint Curve (Black)
Rosenbrock’s Parabolic Valley (purple-yellow color map) & Equality Constraint Curve (Black)

where the feasible region of the optimal values lie somewhere along the intersection of the equality constraint curve and our objective function above.

Joseph-Louis Lagrange developed a method for incorporating an equality constraint directly into the objective function – creating _ the Lagrangian function_ – so that traditional approaches using first and second derivates can still be applied.[2][3] Formally, the Lagrangian function takes the following form:

(7) Formal Definition of Lagrangian Function
(7) Formal Definition of Lagrangian Function

where f(x) and h(x) are the objective function and equality constraints, respectively. _ Λ are the Lagrange multipliers_ that correspond to each equality constraint j. The Lagrange multipliers are treated as new choice variables in the Lagrangian function. It just so happens that the necessary conditions for x to be a minimum of the equality constrained problem is that x corresponds to the stationarity points of the Lagrangian (x, Λ). That is,

(8) Lagrangian First Order Conditions
(8) Lagrangian First Order Conditions

For our example above – the equality constrained Rosenbrock’s Parabolic Valley (Eq. 1)— we can write our Lagrangian function as follows:

(9)
(9)

We can then solve this Lagrangian using Newton’s method, but now including the Lagrange multipliers as additional choice variables.

import sympy as sm

x, y, λ  = sm.symbols('x y λ')

Langrangian_objective = 100*(y-x**2)**2 + (1-x)**2 + λ*(x**2-y-2)
Gamma = [x,y,λ]
Gamma0 = {x:-1.2,y:1,λ:1}

newton_method(Langrangian_objective,Gamma,Gamma0)

With the corresponding output:

One can easily verify that the solution satisfies our equality constraint. And there you have it! That wasn’t too bad, right? This method can be extended to add any number of equality constraints – just add another Lagrange multiplier. Let’s move on now to the incorporation of inequality constraints.

Inequality Constraints – The Logarithmic Barrier Function

Now we will address optimization problems with inequality constraints in our optimization problem. That is, optimization problems that take the form:

(10)
(10)

Suppose, again, we are working with Rosenbrock’s Parabolic Valley but now with the inequality constraints x ≤ 0 and y ≥ 3:

(11) Rosenbrock's Parabolic Valley w/ Inequality Constraint (Problem 2)
(11) Rosenbrock’s Parabolic Valley w/ Inequality Constraint (Problem 2)

Now our optimization problem looks like:

Rosenbrock's Parabolic Valley (purple-yellow color map) & Inequality Constraint Planes (Black)
Rosenbrock’s Parabolic Valley (purple-yellow color map) & Inequality Constraint Planes (Black)

where the feasible region of the optimal values lies in the quadrant bounded by the constraints that is marked by the red star.

Because these constraints do not have a strict equality, our ability to directly include them into the objective function is not as straightforward. However, we can get creative – what we can do is augment our objective function to include a "barrier" in the objective function that penalizes values of the solution that approach the bounds of the inequality constraints. These class of methods are known as "interior-point methods" or "barrier methods."[4][5] Like the Lagrangian function, we can transform our original constrained optimization problem into an unconstrained optimization problem by incorporating barrier functions (the logarithmic barrier function in our case) that can be solved using traditional methods— thereby creating the barrier function. Formally, the logarithmic barrier function is characterized by:

(12) Formal Definition of Logarithmic Barrier Function
(12) Formal Definition of Logarithmic Barrier Function

where ρ is a small positive scalar – known as the barrier parameter. As ρ → 0, the solution of the barrier function B(x,ρ) should converge to the solution of our original constrained optimization function. Note, the c(x) states that depending on how we formulate our inequality constraints (greater than or less than zero) will dictate whether we use the negative or positive of that constraint. We know that y=log(x) is undefined for x ≤ 0, thus we need to formulate our constraint to always be ≥ 0.

How exactly does the barrier method work, you may ask? To begin with, when using the barrier method, we must choose starting values that are in the feasible region. As the optimal values approach the "barrier" outlined by the constraint, this method relies on the fact that the logarithmic function approaches negative infinity as the value approaches zero, thereby penalizing the objective function value. As ρ → 0, the penalization decreases (see figure directly below) and we converge to the solution. However, it is necessary to start with a sufficiently large ρ so that the penalization is large enough to prevent "jumping" out of the barriers. Therefore, the algorithm has one extra loop than Newton’s method alone – namely, we choose a starting value ρ, optimize the barrier function using Newton’s method, then update ρ by slowly decreasing it (ρ → 0), and repeat until convergence.

Logarithmic Barrier for Different Values of ρ
Logarithmic Barrier for Different Values of ρ

Revisiting our example above – the inequality constrained Rosenbrock’s Parabolic Valley (Eq. 2)— we can write our barrier function as follows:

(13)
(13)

Recall that log(a) + log(b) = log(ab) and our one constraint x ≤ 0 → -x ≥ 0. We must then update our code to accommodate the barrier method algorithm:

import sympy as sm
import numpy as np

def constrained_newton_method(
    function: sm.core.expr.Expr,
    symbols: list[sm.core.symbol.Symbol],
    x0: dict[sm.core.symbol.Symbol, float],
    iterations: int = 100,
) -> dict[sm.core.symbol.Symbol, float] or None:
    """
    Performs constrained Newton's method to find the optimal solution of a function subject to constraints.

    Parameters:
        function (sm.core.expr.Expr): The function to optimize.
        symbols (list[sm.core.symbol.Symbol]): The symbols used in the function.
        x0 (dict[sm.core.symbol.Symbol, float]): The initial values for the symbols.
        iterations (int, optional): The maximum number of iterations. Defaults to 100.

    Returns:
        dict[sm.core.symbol.Symbol, float] or None: The optimal solution if convergence is achieved, otherwise None.
    """
    x_star = {}
    x_star[0] = np.array(list(x0.values())[:-1])

    optimal_solutions = []
    optimal_solutions.append(dict(zip(list(x0.keys())[:-1], x_star[0])))

    for step in range(iterations):
        # Evaluate function at rho value
        if step == 0:  # starting rho
            rho_sub = list(x0.values())[-1]

        rho_sub_values = {list(x0.keys())[-1]: rho_sub}
        function_eval = function.evalf(subs=rho_sub_values)

        print(f"Step {step} w/ {rho_sub_values}")  # Barrier method step
        print(f"Starting Values: {x_star[0]}")

        # Newton's Method
        for i in range(iterations):
            gradient = get_gradient(
                function_eval, symbols[:-1], dict(zip(list(x0.keys())[:-1], x_star[i]))
            )
            hessian = get_hessian(
                function_eval, symbols[:-1], dict(zip(list(x0.keys())[:-1], x_star[i]))
            )

            x_star[i + 1] = x_star[i].T - np.linalg.inv(hessian) @ gradient.T

            if np.linalg.norm(x_star[i + 1] - x_star[i]) < 10e-5:
                solution = dict(zip(list(x0.keys())[:-1], x_star[i + 1]))
                print(
                    f"Convergence Achieved ({i+1} iterations): Solution = {solution}n"
                )
                break

        # Record optimal solution &amp; previous optimal solution for each barrier method iteration
        optimal_solution = x_star[i + 1]
        previous_optimal_solution = list(optimal_solutions[step - 1].values())
        optimal_solutions.append(dict(zip(list(x0.keys())[:-1], optimal_solution)))

        # Check for overall convergence
        if np.linalg.norm(optimal_solution - previous_optimal_solution) < 10e-5:
            print(
                f"n Overall Convergence Achieved ({step} steps): Solution = {optimal_solutions[step]}n"
            )
            overall_solution = optimal_solutions[step]
            break
        else:
            overall_solution = None

        # Set new starting point
        x_star = {}
        x_star[0] = optimal_solution

        # Update rho
        rho_sub = 0.9 * rho_sub

    return overall_solution

We can now solve the Barrier function with the code above (Note: Make sure starting values are in the feasible range of inequality constraints & you may have to increase the starting value of rho if you jump out of inequality constraints):

import sympy as sm

x, y, ρ = sm.symbols('x y ρ')

Barrier_objective = 100*(y-x**2)**2 + (1-x)**2 - ρ*sm.log((-x)*(y-3))
Gamma = [x,y,ρ] # Function requires last symbol to be ρ!
Gamma0 = {x:-15,y:15,ρ:10}

constrained_newton_method(Barrier_objective,Gamma,Gamma0)

With the corresponding output:

Clearly the solution satisfies the inequality constraints specified. And there you have it. We have now tackled inequality constraints in our optimization problems. To wrap up, let’s put everything together and move on to tackling constrained optimization problems with mixed constraints – which is simply the combination of what we have done above.

Putting it All Together

Let’s now solve our optimization problem by combining both the equality and inequality constraints from above. That is, we want to solve an optimization of the form:

(14)
(14)

All we have to do is combine the Lagrangian and the Barrier functions into one function. Thus, we can create a generalizable function, call it O, for dealing with optimization problems that have both equality and inequality constraints:

(15) Generalizable function for constrained optimization problems
(15) Generalizable function for constrained optimization problems

where, as before, Λ is the vector of Lagrange multipliers and **** __ ρ is the barrier parameter. Thus, combining our constrained (Eq. 6) and unconstrained problems from above (Eq. 11), we can formulate our mixed constrained optimization problem as follows:

(16)
(16)

In python,

import sympy as sm

x, y, λ, ρ = sm.symbols('x y λ ρ')

combined_objective = 100*(y-x**2)**2 + (1-x)**2 + λ*(x**2-y-2) - ρ*sm.log((-x)*(y-3))
Gamma = [x,y,λ,ρ] # Function requires last symbol to be ρ!
Gamma0 = {x:-15,y:15,λ:0,ρ:10}

constrained_newton_method(combined_objective,Gamma,Gamma0)

With the corresponding output:

We can verify that this solution does in fact obey the constraints. Specifically, x ≤ 0, y ≥ 3, & x² – y = 2. Satisfying, no?


Conclusion

Phew. Take a deep breath – you earned it. Hopefully at this point you should have a much better understanding of the techniques to incorporate constraints into your optimization problems. We are still just brushing the surface of the different tools and techniques utilized in mathematical optimization.

Stay tuned for part 3 of this series, the final part, where we will apply the optimization material learned thus far alongside econometric & economic theory to solve a Profit Maximization problem. It is my goal that part 3 will bring home everything we have covered and show a practical use case. As usual, I hope you have enjoyed reading this much as much I have enjoyed writing it!


Resources

[1] https://en.wikipedia.org/wiki/Constrained_optimization

[2] Snyman, J. A., & Wilke, D. N. (2019). Practical mathematical optimization: Basic optimization theory and gradient-based algorithms (2nd ed.). Springer.

[3] https://en.wikipedia.org/wiki/Lagrange_multiplier

[4] https://en.wikipedia.org/wiki/Interior-point_method

[5] https://en.wikipedia.org/wiki/Barrier_function


Access all the code via this GitHub Repo: https://github.com/jakepenzak/Blog-Posts

I appreciate you reading my post! My posts on Medium seek to explore real-world and theoretical applications utilizing econometric and statistical/machine learning techniques. Additionally, I seek to provide posts on the theoretical underpinnings of various methodologies via theory and simulations. Most importantly, I write to learn! I hope to make complex topics slightly more accessible to all. If you enjoyed this post, please consider following me on Medium!


Related Articles