Linear Algebra 4: Matrix Equations

Solving matrix equations Ax= b

tenzin migmar (t9nz)

Published in

Towards Data Science

7 min readNov 10, 2023

Preface

Welcome back to the fourth edition of my ongoing series on the basics of Linear Algebra, the foundational math behind machine learning. In my previous article, I introduced vectors, linear combinations, and vector spans. This essay will take a look at the matrix equation Ax = b and we’ll see how the very principle of solving a system of linear equations is linked to the matrix equation.

This article would best serve readers if read in accompaniment with Linear Algebra and Its Applications by David C. Lay, Steven R. Lay, and Judi J. McDonald. Consider this series as a companion resource.

Feel free to share thoughts, questions, and critique.

The Intuition

We last left off on learning about linear combinations which I promised would have important implications. Recall that given vectors v₁, v₂, … vₐ in ℝⁿ and scalars (also known as weights) c₁, c₂, … cₐ, the linear combination is the vector defined by the sum of the scalar multiples, c₁v₁ + c₂v₂ + … + cₐvₐ.¹

We say that a vector b is a linear combination of a set of vectors v₁, v₂, .. vₐₚ in Rⁿ, if there exists a set of weights c₁, c₂, … cₐ (a solution) such that c₁v₁ + c₂v₂ + … + cₐvₐ = b.

To determine if b is a linear combination of some given vectors v₁, v₂, .. vₐ we arranged our vectors into a system of linear equations, then created an augmented matrix of our equations and used row reduction operations to reduce the matrix to reduced echelon form. If the row reduced echelon form had an inconsistency, that is, a row that looked like this: [0, 0, … | m] where m ≠ 0, that meant that our vector b is not a linear combination of the vectors because no set of weights exist for the equation c₁v₁ + c₂v₂ + … + cₐvₐ = b to hold true.

If there was no such inconsistency, that meant that we could write the vector b as a linear combination of a set of vectors, such as the example above. Do you remember how we verified our answer at the end? We’d multiply each vector by its respective scalar and then find the vector sum. If the vector sum equalled b, we knew that we had done our calculations correctly and that b was indeed a linear combination.

This verification process is the matrix equation Ax = b in disguise!

Ax = b

If A is an m x n matrix, and x is in Rⁿ (you’ll see why it’s important x is in Rⁿ next section), then the product Ax is the linear combination of the vectors (columns) in A, using the corresponding scalars in x.

Notice that none of this is new material, we’ve already unknowingly computed Ax when verifying our linear combinations in my previous article. The Ax = b matrix equation is still fundamental though because it formalizes all of this into a compact notation and will resurface later on in new ways.

Now we know that if we are given an m x n matrix A and x and we compute the matrix product Ax and it is equal to b, then b can be written as a linear combination of the vectors (columns) in A and the scalars/entries in x. So in summary: the equation Ax = b will only have a solution (x) if b can be written as a linear combination of the columns of A.

Matrix Multiplication

I’ve introduced Ax = b as a matrix product, but I haven’t yet explained matrix multiplication (which is what Ax is)!

Matrix multiplication is the operation of multiplying two matrices to produce one, their product. We’ve already seen matrix addition where two matrices are added to produce their sum. In order for matrix addition to be defined, the two matrices that are being added, matrix A and matrix B must be of the same size. Similarly, matrix multiplication also has a requirement. To multiply matrix A and matrix B and produce AB, the number of columns in matrix A must be equal to the number of rows in matrix B. The size of the product of matrix A and B, which we’ll call matrix C will depend on the number of rows in matrix A and number of columns in matrix B. Matrix C will have m (# of rows in matrix A) rows and p (# of columns in matrix B) columns.

So, how does matrix multiplication work? If we were to multiply matrix A and B, each of the i-th row, j-th column entries in the matrix product is the dot product of the i-th row in matrix A and the j-th row in matrix B.

For now, all you need to know is that the dot product is the summation of product of corresponding entries between two vectors and that it is only defined when the two vectors have the same number of entries. This explanation is far from doing the dot product justice, but I’ll save the full geometric intuition for later.

For brevity, I’ve computed the matrix product of two 2 x 2 matrices, but the same procedure generalizes for matrices of any size as long as the matrices meet the criteria for matrix multiplication, otherwise their product will be undefined.

Properties of the Matrix Multiplication

If A, B and C are n x n matrices and c and d are scalars, then the following properties are true.³

AB ≠ BA (not commutative in general)
(AB)C = A(BC) (associative)
A(B+C) = AB + AC and (B+C)A = BA + CA (distributive)
0A = 0 (multiplicative property of zero)

Take care in noting that matrix multiplication is not commutative, this property might take a while to stick given we are intuitively used to commutativity with real numbers.

These properties are useful for computing matrix products, which will be a recurring subject throughout Linear Algebra.

Conclusion

Matrix multiplication is a fundamental mathematical operation that underpins the core functionality of neural networks, particularly in their feedforward and back propagation phases.

In the feedforward phase of a neural network, data is processed through its various layers, and matrix multiplication is at the heart of this operation. Each layer in a neural network is composed of neurons, which are represented as weighted sums of the inputs, followed by an activation function. These weighted sums are calculated using matrix multiplication.

During the back propagation pass, the neural network is learning from its mistakes. It adjusts the weights of neurons to minimize the error between predicted and actual outputs. Matrix multiplication is again a key component of this process, specifically in calculating gradients, which indicate how much each weight should be adjusted to minimize the error.

Learning math is an exciting venture purely on its own merit, but learning about the applications of Linear Algebra alongside the theory can make the journey up a steep learning curve even more inspiring.

Summary

In this chapter, we learned about:

The intuition behind linear combinations and the matrix product Ax = b: how the matrix product isn’t necessarily a new concept, but one that formalizes a procedure we’d already been using!
Ax = b: the matrix product has a solution x if b is a linear combination of a the set of vectors (columns) in A.
Matrix multiplication: the operation behind Ax = b that is widely used in machine learning applications, specific examples including in neural networks.
Properties of matrix multiplication: non-commutativity, associativity, distributive and the multiplicative property of zero.

Notes

*All images created by the author unless otherwise noted.
*I apologize for taking a while to continue where we last left off. I am currently in the midst of taking midterm exams (including one for Linear Algebra haha!)
¹Definition for linear combinations referenced from Linear Algebra and Its Applications 6th Edition by David C. Lay, Steven R. Lay, and Judi J. McDonald
²Definition for matrix product properties referenced from Linear Algebra and Its Applications 6th Edition by David C. Lay, Steven R. Lay, and Judi J. McDonald
³Matrix properties referenced from src.