
The link to Chapter-4: Important Convex Functions and Convex Properties is here. Chapter 5 is about some final topics that we will need to look at before diving into Convex Optimization. As the chapter name goes, you could consider these pre-requisites for Convex Optimization. Mind you, these concepts will be very important when we solve optimization problems later on. Hence, the focus is considerably high on Mathematics concepts is this chapter. The topics that will be covered are:
- Scalar and vector valued functions
- Graphical interpretation of gradient
- Hessian matrix
- Positive semi-definite matrices
- Differentiable convex functions
- First and second order Convexity conditions
- Principle Minor Test
- Role of Eigen values in determining convexity
- Composition of functions
- Some problem solving
So, let’s start with scalar and vector valued functions. These concepts are fairly simple. A function that takes in one or more values, but returns a single value is called a scalar valued function. An example of a scalar valued function is shown below:

As you can see in the equation above, the function takes 3 inputs (x, y and z) and has one single output. Thus, the function is a scalar function. A n-variable scalar valued function maps from –

An example to better understand this would be that of a paraboloid function that is given by –


Now, we will look into what vector valued functions are. Vector valued functions are those functions that take in one or more variables but produce a a vectors of outputs. An example of this in 2-D is given by:


In this section, we will look into the graphical interpretation of Gradient. We saw in the earlier section that gradient of a 3-D plot are vectors in 2D. So, if the 3D plot has its center at origin, the gradient vectors of the 3-D plot move away from the center. Refer to the diagram below that shows the gradient vector space of a paraboloid in the x-y plane .

The gradient vector space in the x-y plane of the paraboloid has been shown above. As you can see, the projections of the points P and Q in the paraboloid has been shown using points Ṕ and Á. Ṕ and Á represent the gradient vectors of the points P and A in the paraboloid. Similarly, the vector space (the space in which all the vectors lie in) represents the gradients at all the points in the paraboloid. Imagine that you am climbing up the paraboloid. Say you are at point P and you want to know the direction to walk, in order to reach the top of the paraboloid the fastest. Looking the paraboloid from below, we can see that in order to climb the paraboloid the fastest, we need to go directly away from the origin. This is because at directions that are directly away from the origin is when the paraboloid the steepest. Thus, a gradient points at the direction of the steepest descent that is basically a tangent to the curve, the paraboloid in this case. Refer to the following diagram to understand this better.

In the diagram above, the vectors in blue represent the steepest direction and hence are the gradient vectors. The vectors shown in black represent other directions of movement that are not the steepest. Also, note that the length of the gradient vector of a point in the curve tells us the steepness of the curve at that point.

In this section we will look at what a Hessian matrix is. We will not get into the applications of the Hessian matrix right away, but will look into these as the series progresses. For now, we will only try and understand what this matrix is. A Hessian matrix is a square matrix of second order partial derivatives of a function. A Hessian matrix is used to identify the local minima and the local maxima of a function at a particular point. The general form of a Hessian matrix is given as:

Something to note here is that if the Hessian of a function at a particular point is less than 0, then the point is a local maxima. On the other hand, if the value is greater than 0, then the point is a local minima. We will see more of this concept is a little while.
Now, we will look into what positive semi-definite matrices are. The manner in which positive semi-definite matrices can be understood is that say we have a vector Z moving in a particular direction. If this vector Z is hit by a matrix A such that the vector changes its direction less than 90°, then we call the matrix A positive semi definite. This can be shown using the following diagram.

The mathematical definition of a positive semi-definite matrix is:

Note that the concept of positive semi-definite matrices will be used in determining whether a function is convex or not.
In this section, we will look into what are differentiable convex functions. We won’t really get into the depth here, but only know the condition that needs to hold when a function is differentiable. This property of differentiable convex functions will be used in the next section. So, the properties are given below:

In this section, we will look into first and second order convexity conditions. These conditions decide whether or not a function is convex. Lets look into the first order convexity condition.


Now that we have understood the graphical interpretation of the first order convexity condition, there is a mathematical proof of the above property also. Let’s have a look at it.

Now, lets prove the other side of the story:

Let us see an example of a function here and prove that it is a convex function using the above property.

Now, we will look into the second order convexity condition. This is given by:

Now, let us see what the Principle Minor Test is. We won’t get into too much detail and the logic behind this test. We will simply look into how this test is done. Note that the Principle Minor Test will be used to determine if a function is positive semi-definite or not. In order to use this test, we need to find out the Hessian Matrix of the function and follow the below mentioned steps:

A point that needs to be kept in mind is that Principle Minor Test works only when all the elements of the Hessian matrix of the function are constants. Of course, the number of these boxes will increase as the dimension of the matrix increases.
Now, we will see how Eigen values can help us determine whether a matrix is positive semi-definite or not. Again, we won’t really get into the details of this, but we will look into how Eigen values work. A matrix is positive semi definite if its Eigen values are greater than or equal to 0. So, how do we calculate Eigen values?

This tells us that this matrix is positive definite. Note that if both the Eigen values were greater than 0, then the matrix would have been positive definite. Another point to note is that the identity matrix is always positive semi-definite.
Now, we will look at a very interesting property of convex functions. This is property is characterized using composition of functions. This means that given a function g(x)=f(h(x)) then,
- g is convex if –
- h is convex and f is convex and non-decreasing. In order to understand what a non-decreasing function is, have a look at the graph below. It is pretty much self-explanatory. You will notice that at no point in the graph is the slope less than 0.

- h is concave and f is convex and non-increasing. An example of the non-increasing function is shown below:

A couple of example of g in this case would be as follows:

- g is concave if –
- h is concave and f is concave and non-decreasing
- h is convex and f is concave and non-increasing
With this, we come to end of the theory part in this chapter. In the next section, we will solve some problems to better understand the concepts that we have seen so far.
For the next chapter, Chapter-6: Optimization Problems, click here.
This is the problem solving section. Hopefully, those who love solving mathematical problems will enjoy this section.
Question 1:

Question 2:


Question 3:

The difference between a convex function and a strictly convex function is pretty simple actually. If the line segment joining any two points in the curve lies on or above the curve, then the function is a convex function. If this line segment lies ‘strictly’ above the curve, then the function becomes a strictly convex function. Below are the diagrams to help you visualize this.

As you can see from the graph on the right, the line segment joining points x3 and x4 is lying on the graph. Therefore, the graph is a convex function. Note that there is another concept called "strong convexity". We will look into it at some later chapter in the series.
So, the proposition that we can form here is that if a function f is twice differentiable, then
- f is convex if and only if the Hessian of f is greater than or equal to 0.
- if the Hessian of f is strictly greater than 0 for all x belonging to the domain of f, then f is strictly convex.
Question 4:

With this, we come to the end of this chapter. The best part is that from the next chapter, we will finally dive into proper optimization concepts. Up till now, we have been only dealing with auxiliary topics that would help us understand and solve optimization problems.
In the previous chapter, Chapter-4: Important Convex Functions and Convex Properties here, I had asked "If the linear combination of n number of vectors lies in a n-dimensional space, what dimensional space do you think does the affine combination of these vectors lie in?". The answer is simple. The vectors would lie in a n-1 dimensional space. We can get an idea of this from the fact that when the linear combination of some vectors is a 2-D plane, the affine combination of the same vectors is a line.
The next question that I asked was "What do u think a convex cone or a conic hull would look like in 3-D?". This is even simpler. These would look like a cone, as in like an ice-cream cone and that’s it!
Click here for the next chapter, Chapter-6: Optimization Problems.