The world’s leading publication for data science, AI, and ML professionals.

Vector Norms in Machine Learning

A guide to p-norms.

Photo by Markus Winkler on Unsplash
Photo by Markus Winkler on Unsplash

If you are reading this post, it is likely that you already know what vectors are and their indispensable place in Machine Learning. To recap, Vectors are a one-dimensional array of numbers of a particular length. This is shown below:

A vector containing n elements. (Image by author)
A vector containing n elements. (Image by author)

Elements of a vector are arranged in a particular order, and the location of an element usually holds an inherent meaning. We can access the individual elements using their position (or index).

Accessing an element of a vector using the index. (Image by author)
Accessing an element of a vector using the index. (Image by author)

We can also think of vectors as representing a point in space. If the length of the vector is n, the point is said to be in an n-dimensional space. For instance, if the size of the vector is 2, this can denote a point in 2-dimensional space with respect to the origin, as shown below:

A 2D plot showing a vector for the point (4,2). (Image by author)
A 2D plot showing a vector for the point (4,2). (Image by author)

Vector Norms

Vector Norms are defined as a set of functions that take a vector as an input and output a positive value against it. This is called the magnitude of a vector. We can obtain different lengths for the same vector depending on the type of function we use to calculate the magnitude.

A diagram showing the family of vector norm functions and their output. (Image by author)
A diagram showing the family of vector norm functions and their output. (Image by author)

Norms, although often overlooked, sit at the core of training Machine Learning models. Essentially, right before each iteration of backpropagation, you compute a scaler loss value (positive), which is the average of the sum of the difference between the predicted values and the ground truth values squared. This scaler loss value is nothing but an output of a norm function. The way we compute the loss is shown below:

A diagram showing the calculation of loss using predicted values and true values. (Image by Author)
A diagram showing the calculation of loss using predicted values and true values. (Image by Author)

The Standard Norm Equation – P-norm

All norm functions originate from a standard equation of Norm, known as the p-norm. For different values of the parameter p (p should be a real number greater than or equal to 1), we obtain a different norm function. The generalized equation, however, is shown below:

The p-norm equation. (Image by author)
The p-norm equation. (Image by author)

This takes an n-dimensional vector x and raises each element to its p-th power. Then, we sum all the obtained elements and take the p-th root to get the p-norm of the vector, also known as its magnitude. Now, with different values of the parameter p, we will obtain a different norm function. Let’s discuss them one by one below.

L0 Norm:

Although p=0 lies outside the domain of the p-norm function, substituting p=0 in the above equation gives us the individual vector elements raised to the power 0, which is 1 (provided the number is not zero). Furthermore, we also have a p-th root in the equation, which is not defined for p=0. To handle this, the standard way of defining the L0 norm is to count the number of non-zero elements in the given vector. The image below shows the output of the L-0 norm function for the given vector:

Image showing the value of L0 norm. (Image by author)
Image showing the value of L0 norm. (Image by author)

L1 Norm:

Substituting p=1 in the standard equation of p-norm, we get the following:

The equation for L1 Norm. ((Image by author)
The equation for L1 Norm. ((Image by author)
  • When used to compute the loss, the L1 norm is also referred to as the Mean Absolute Error.
  • L1 norm varies linearly for all locations, whether far or near the origin.

The image below shows the output of the L1 norm function for the given vector:

Image showing the value of L1 norm. (Image by author)
Image showing the value of L1 norm. (Image by author)

L2 Norm:

Of all norm functions, the most common and important is the L2 Norm. Substituting p=2 in the standard equation of p-norm, which we discussed above, we get the following equation for the L2 Norm:

The equation for L2 Norm. ((Image by author)
The equation for L2 Norm. ((Image by author)
  • The above equation is often referred to as the root mean squared error when used to compute the error.
  • L2 norm measures the distance from the origin, also known as Euclidean distance.

The image below shows the output of the L2 norm function for the given vector:

Image showing the value of L2 norm. (Image by author)
Image showing the value of L2 norm. (Image by author)

Squared L2 Norm:

As the name indicates, the squared L2 Norm is the same as the L2 Norm but squared.

The equation for Squared L2 Norm. ((Image by author)
The equation for Squared L2 Norm. ((Image by author)
  • The above equation is often referred to as the mean squared error when used to compute the error in machine learning.

The squared L2 Norm is relatively computationally inexpensive to use compared to the L2 Norm. This is because:

  1. It is missing the square root.
  2. Within Machine Learning applications, the derivative of the Squared L2 Norm is easier to compute and store. The derivate of an element in the Squared L2 Norm requires the element itself. However, in the case of the L2 Norm, the entire vector is needed.

Max Norm (or L-∞ Norm):

As infinity is an abstract concept in Mathematics, we can’t just substitute p=∞ in the standard p-norm equation. However, we can study the function’s behavior as p approaches infinity using limits. A simple derivation for the equation of Max-norm can be found here.

The equation for Max Norm. (Image by author)
The equation for Max Norm. (Image by author)

Max norm returns the absolute value of the largest magnitude element. The image below shows the output of the Max norm function for the given vector:

Image showing the value of Max norm. (Image by author)
Image showing the value of Max norm. (Image by author)

Concluding notes:

  1. Vector norm is a function that takes a vector as an input and outputs a positive value.
  2. All norm functions can be derived from a single equation. The family of norm functions is known as p-norm.
  3. The L1 norm is also referred to as the Mean Absolute Error.
  4. The L2 Norm is also referred to as the Root Mean Squared Error.
  5. The Squared L2 Norm is also referred to as the Mean Squared Error.

Related Articles