Quadratic Optimization with Constraints in Python using CVXOPT

Published in

Towards Data Science

9 min readMay 10, 2022

Quadratic optimization is a problem encountered in many fields, from least squares regression [1] to portfolio optimization [2] and passing by model predictive control [3]. In all of these problems, one must optimize the allocation of resources to different assets or agents (which usually corresponds to the linear term) knowing that there can be helpful or unhelpful interactions between these assets or agents (this corresponds to the quadratic term), all the while satisfying some particular constraints (not allocating all the resources to the same agent or asset, making sure the sum of all allocated resources does not surpass the total available resources, etc.). Difficulties may arise when the constraints cannot be formulated linearly. In this article, we will see how to tackle these optimization problems using a very powerful python library called CVXOPT [4, 5], which relies on LAPACK and BLAS routines (these are highly efficient linear algebra libraries written in Fortran 90) [6].

Formulated mathematically, the goal is to find the arguments that minimize a multivariate quadratic function while fulfilling some equality and inequality constraints. The function to be optimized has the following general form:

Quadratic Function

where x is the unknown vector of size n, r is a vector of the same size as x, and Q is a square symmetric matrix of dimension n by n. The constraints can be formulated as a set of equalities and inequalities, such that:

where A is an n by m matrix (with m the number of equality constraints), b is a vector of size m, G is an n by m’ matrix (with m’ the number of inequality constraints), and h is a vector of size m’. The curly inequality symbol means that the inequality holds for every element of the vector.

How do we write this in the CVXOPT formalism? Consider the code below:

# Import Librariesimport numpy as np
import cvxopt as opt
from cvxopt import matrix, spmatrix, sparse
from cvxopt.solvers import qp, options
from cvxopt import blas
# Generate random vector r and symmetric definite positive matrix Qn = 50
r = matrix(np.random.sample(n))
Q = np.random.randn(n,n)
Q = 0.5 * (Q + Q.T)
Q = Q + n * np.eye(n)
Q = matrix(Q)# Solvesol = qp(Q, -r)

The solution ‘sol’ is a dictionary containing, among other things, the vector that minimizes the loss function under the key ‘x’, as well as the information whether an optimal solution was found under the key ‘status’.

How does one implement constraints in this formalism? All that needs to be done is supply the matrices A and G as well as the vectors b and h defined earlier. Let’s say we want the sum of the elements of x to be equal to one, as well as all elements of x to be positive. Mathematically, these conditions are:

and can be written in matrix format as:

We can thus define the matrices A, G, b, and h as:

In the CVXOPT formalism, these become:

# Add constraint matrices and vectorsA = matrix(np.ones(n)).T
b = matrix(1.0)
G = matrix(- np.eye(n))
h = matrix(np.zeros(n))# Solve and retrieve solutionsol = qp(Q, -r, G, h, A, b)['x']

The solution now found follows the imposed constraints.

Friction effects

Now let us add a different type of constraint that is not linear. Suppose an optimal solution has been found at a certain time. At a later time, the matrix Q and the vector r have been updated with new values. However, changing the allocation of resources or assets has a cost. Changing a value in the old vector x must therefore be worth it in order to justify this cost. The problem can now be formulated as:

Quadratic Function with turnover cost

with c a vector representing the friction effects from going to one solution to another, or the cost of allocating and unallocating resources. This new loss is no longer quadratic, as there is a term containing an absolute value, which is problematic as it is not differentiable. How does one go around this problem? The solution is to add extra variables that will correspond to the change from one state to the next, and then linearizing the loss function. The linear part of the preceding equation becomes:

Linearized turnover cost

In the above equation we have considered that the friction effects or costs may be different for allocating and unallocating resources to the different agents/assets. We must then add extra constraints to ensure these extra variables correspond well to the change from one solution to the next:

We obtain the new unknown vector X by concatenating x with the variations of x. We do the same for the new Q and r matrix and vector:

and we implement the constraints:

The code is then modified in the following way:

# Modify Loss Functionn = 200
c = 0.001
max_weight = 0.05r = matrix(np.block([np.random.sample(n), -c * np.ones(2*n)]))
Q = np.random.randn(n,n)
Q = 0.5 * (Q + Q.T)
Q = Q + n * np.eye(n)
Q = matrix(np.block([[Q, np.zeros((n,n)), np.zeros((n,n))], [np.zeros((n,n)), np.zeros((n,n)), np.zeros((n,n))], [np.zeros((n,n)), np.zeros((n,n)), np.zeros((n,n))]]))# Modify constraint matrices and vectorsA = matrix(np.block([[np.ones(n), c * np.ones(n), -c * np.ones(n)], [np.eye(n), np.eye(n), -np.eye(n)]]))
old_x = np.zeros(n)
old_x[np.random.choice(n, n_assets_r, replace=False)] = max_weight
b = matrix(np.block([1.0, old_x]))
G = matrix(- np.eye(3*n))
h = matrix(np.zeros(3*n))# Solve and retrieve solutionsol = qp(Q, -r, G, h, A, b)['x']

We have therefore seen how to take into account the friction effects for transitioning from one solution to another.

Practical Example: Portfolio Optimization

Let us consider a practical example to fully understand the use of this technique: portfolio optimization. In Markowitz’s portfolio optimization theory [2], the r vector corresponds to a prediction of the returns of different assets. This prediction is given by any predictive model which we will not consider here. The Q matrix corresponds to the covariance matrix of the returns of these same assets. One may take the historical covariance matrix in this case. We will change the notation here a bit and use ω as the unknown vector. The values of ω correspond to the weights of the different assets in the portfolio. The loss function can now be written as:

Quadratic Function for Portfolio Optimization

where we have also introduced λ which represents the user’s risk aversion. The first term of the equation represents the expected returns of this portfolio. The second term represents the risk of the portfolio. Low values of λ mean that more risk is tolerated. The last term represents the transaction costs to go from one portfolio to another.

We would like to add a few more constraints which are common in portfolio optimization. We would like our portfolio to be somewhat diversified, which we can ensure by adding an upper bound to the weights. We might also want to reduce even more the movement from one portfolio to another, which is translated by a turnover constraint. Mathematically, these can be written as:

where T corresponds to the maximum turnover allowed, and can take on values between 0 (no modifications allowed) and 2 (no turnover constraint). The last term in the constraints listed below is a modification of the previous constraint where the sum of weights should be equal to one. This modification reflects the fact that when assets are sold and bought, transaction fees are paid and therefore the capital of the portfolio decreases [6]. In matrix form, these constraints become:

and the code is modified in the following way:

max_weight = 0.05
turnover = 2.0# Modify the Q matrix so that it resembles 
# the covariance matrix of returnsT = np.random.randn(n,100)
Q = np.cov(T)
Q = matrix(np.block([[Q, np.zeros((n,n)), np.zeros((n,n))], [np.zeros((n,n)), np.zeros((n,n)), np.zeros((n,n))], [np.zeros((n,n)), np.zeros((n,n)), np.zeros((n,n))]]))# Create constraint matricesG = matrix(0.0, (6 * n + 1, 3 * n))
h = opt.matrix(0.0, (6 * n + 1, 1))
for k in range(3 * n):
    # wi > 0 constraint
    G[k, k] = -1# wi > max_weight
    G[k + 3 * n, k] = 1
    h[k + 3 * n] = max_weightfor k in range(2 * n):
    # sum dwi+ + dwi- < turnover
    G[6 * n, k + n] = 1
    
h[6 * n] = turnover

We then compute the efficient frontier, which is the collection of the best portfolios for a given risk aversion

# Compute random portfolios in order to have a baselinen_random = 10000
random_returns = np.zeros(n_random)
random_risks = np.zeros(n_random)
n_assets_r = 20for i in range(n_random):
    w0 = np.zeros(n)
    w0[np.random.choice(n, n_assets_r, replace=False)] = 1 / n_assets_r
    random_returns[i] = np.dot(w0, r[:n])
    random_risks[i] = np.dot(w0, np.dot(Q[:n,:n], w0))# Compute the optimal portfolio for different values
# of lambda, the risk aversionlmbdas = [10 ** (5.0 * t / N - 1.0) for t in range(N)]sol = [qp(lmbda / 2 * Q, -r, G, h, A, b)['x'] for lmbda in lmbdas]optimal_returns = np.array([blas.dot(x, r) for x in sol])
optimal_risks = np.array([blas.dot(x, Q * x) for x in sol])

In this figure, we have plotted the risks and returns of a collection of random portfolios to have a baseline. We see that the best computed portfolios always have far greater returns than any random portfolio for a given risk. The risk and return of the initial portfolio is also portrayed.

In order to visualize the importance of the maximum turnover, we can repeat the calculations of the efficient frontier varying its value (25%, 50%, 100%, and 200%). Completely changing the portfolio implies selling all the assets (turning over 100% of assets) and then buying a completely new set of assets (turning over 100% again) which amounts to 200% turnover. The maximum amount of turnover of a portfolio is therefore 200%. We expect the efficient frontier to contract with smaller maximum turnovers, as the algorithm has less options to change the weights of the initial portfolio.

This assumption is verified to a certain extent: it would seem that increasing the maximum turnover from 100% to 200% with this particular initial portfolio does not hinder the optimization process too much. This is likely due to the nature of the predictions, which in our case do not change much from one time step to another. Therefore, a somewhat optimized portfolio does not require too many changes in order to be fully optimized.

Closing remarks

In this article we have seen how to use CVXOPT which is a powerful and fast solver in order to solve quadratic optimization problems with constraints. We have seen how to adapt some types of constraints and losses which are neither linear nor quadratic (such as the transaction cost loss and the turnover constraint) so that the solver can handle them. However, while the solver is very efficient and quite flexible, it cannot handle all types of constraints. Indeed, if we wish to add a sparsity constraint (we want to have at most N non-zero weights), this cannot be reformulated in a linear or quadratic way. In this case, it may be worthwhile to investigate other methods that are more flexible and that can handle any type of loss function, such as simulated annealing for example.

References

[1] https://mathworld.wolfram.com/LeastSquaresFitting.html

[2] https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1540-6261.1952.tb01525.x

[3] https://arxiv.org/abs/2002.06835

[4] https://cvxopt.org/

[5] Optimization for Machine Learning, Suvrit Sra, Sebastian Nowozin and Stephen J. Wright

[6] https://www.netlib.org/lapack/

[7] Introduction to Risk Parity and Budgeting, Thierry Roncalli

About us

Advestis is a European Contract Research Organization (CRO) with a deep understanding and practice of statistics and interpretable machine learning techniques. The expertise of Advestis covers the modeling of complex systems and predictive analysis for temporal phenomena.
LinkedIn: https://www.linkedin.com/company/advestis/

Quadratic Optimization with Constraints in Python using CVXOPT

Friction effects

Practical Example: Portfolio Optimization

Closing remarks

About us

Written by Perceval Desforges