Introduction
Linear algebra, via the use of matrices and vectors, along with linear algebra libraries (such as NumPy in Python), allows us to perform a large number of calculations in a more computationally efficient way while using simpler code. Knowing at least the numeric operations of linear algebra is crucial to further understanding what happens in our Machine Learning models. Although having the geometric intuition behind linear algebra can be incredibly useful in visualizing the operations we will discuss below, it is not required to understand most machine learning algorithms.
In this tutorial, we will discuss scalars, vectors, matrices, matrix-matrix addition and subtraction, scalar multiplication and division, matrix-vector multiplication, matrix-matrix multiplication, identity matrices, matrix inverses, and matrix transposes. In addition, we will very briefly discuss some of the geometric intuition behind some of these numeric operations.
Matrices
A matrix is a rectangular array of numbers. Those numbers are contained within square brackets. In other words, a matrix is a 2-dimensional array, made up of rows, and columns. The numbers contained in the matrix, or the matrix elements, can be data from a machine learning problem, such as feature values.

Since matrices can be any number of rows and columns, we must specify the dimension of that matrix, meaning the number of rows x number of columns.
For example, if matrix A has 4 rows and 2 columns, then it is a 4×2 matrix. Another way of saying this would be: matrix A is an element of the set R^(4×2), which is the set of all matrices that are of the dimension 4×2.
Indexing a MatrixA matrix gives us a way to quickly organize, index, and access a large amount of data. Thus, to access the data, we can add a subscript, or index, to a matrix that points to a specific element or entry inside the matrix. For example, for matrix A, its entries, or elements, can be indexed as follows:
Aij = "i,j entry" in the ith row, jth column.

Vectors
A vector is a special case of a matrix. Vectors is a matrix that has only 1 column, thus it is an nx1 matrix, otherwise known as a column vector. Thus we can think of a matrix as a group of column vectors or row vectors. We will see later on that if we take the transpose of a column vector, we get a row vector, or a vector with only 1 row (1xn matrix).

If we have a vector with four elements/entries, then we can either say that it is a 4×1 matrix, or a 4-dimensional vector. Thus, it is an element of the set R^(4×1), or R⁴, since it is a vector.
Indexing a Vectoryi = ith element

A Few Notes:
Vectors can be 1-indexed or 0-indexed. In other words, the first entry in a vector can either be the 0th element or the 1st element. In most programming languages, such as python, vectors will be 0-indexed.
By convention, we denote matrices with a capital letter, such as the matrix A above, and vectors with a lowercase letter, such as the vector y above. If you are familiar with using scikit-learn in python, remember that we usually name our features and labels as X features and y labels, since the features will be within a matrix, and labels within a column vector.
Matrix-Matrix Addition and Subtraction
To add two matrices, we take the elements that are at the same index from each matrix and add them up one at a time. That means that matrices must be of the same dimension to be added. The resultant matrix will also be of the same dimension, with each of its elements being the sum of the corresponding elements from the added matrices.

If we have matrix A, plus matrix B, equals matrix C: Cij = Aij + Bij
For matrix-matrix subtraction, the process is the same. We subtract the elements of the second matrix from the corresponding elements in the first matrix. Or we can think of it as matrix-matrix addition, with the second matrix elements all being multiplied by -1 before adding the matrices (see scalar multiplication below).
Scalar Multiplication and Division
A scalar is just a real number. We can multiply matrices by a scalar number. Doing so would "scale" the matrix (or vector), hence the name scalar, since we would multiply the scalar by each number in the matrix.
Note: It does not matter what order we write our scalar and matrix in. The result will be the same.

In scalar multiplication, we just take the scalar, or real number, and multiply it by each element in our matrix. Thus, our outcome is a matrix with the same dimension as the matrix that we multiplied by the scalar.

When dividing a matrix by a scalar, we can think of it as multiplying that matrix by the reciprocal of the scalar.
Matrix-Vector Multiplication
When multiplying a matrix and a vector, we multiply the vectors by each row in the matrix. The result will be a vector with the same number of rows as the matrix.

We first start by multiplying the numbers in the vector with the corresponding numbers in the first row of the matrix and add up those products. That sum will be the first element of our resultant vector. We then multiply the numbers in the vector with the numbers in the second row of the matrix, add up those products, and the sum will be the second element in the resultant vector. And so on… Thus, there will be as many elements in our resultant vector as there are rows in our matrix.

If we take matrix A, with a dimension of mxn, multiply it by vector x, which is an nx1 matrix (or n-dimensional vector), the outcome will be vector y, that is an m-dimensional vector (or mx1 dimensional matrix). To get yi, multiply A’s ith row with elements of vector x, and add them up.
Geometric intuition of matrix-vector multiplication:
Let’s interpret a matrix-vector multiplication geometrically. Imagine we have a vector, or a line in space, which we can visualize in a coordinate system. Then imagine that the matrix changes, or transforms, this space, based on the information inside of the matrix. In linear algebra, these transformations are linear transformations since they follow a few rules. A matrix-vector multiplication basically takes in the original vector, and spits out the new vector, based on this new space decided by the matrix.
Thus, a matrix can be thought of as a function (transforming space), the vector we multiply by this matrix can be thought of as the initial vector (or input), and the resultant vector (the changed vector that is a result of this linear transformation of space), is the output.
Matrix-Matrix Multiplication
To multiply two matrices, we can think of it as separate matrix-vector multiplications. In other words, we would remove the column vectors from the second matrix, and perform matrix-vector multiplications of those column vectors with the first matrix. Then we put those resultant column vectors together into a matrix, which will be the outcome. Thus, in order to multiply two matrices together, then the number of columns of the first matrix must equal the number of rows in the second matrix.

If we have matrix A, of dimension mxn, and multiply it by matrix B, of dimension nxo, then the resultant matrix will be of dimension mxo:

The ith column of the matrix C is obtained by multiplying A with the ith column of B (for i=1,2,…,o). In matrix-vector multiplication, the o value was 1, since a vector only has 1 column.
Geometric intuition of matrix-matrix multiplication:
We can think of matrix-matrix multiplication as two (or more) linear transformations of space, applied one after another. These linear transformations are applied in a specific order: from right-to-left. In other words, if we have the matrix-matrix multiplication of AxBxC, the linear transformation encoded in matrix C occurs first, then matrix B, and then matrix A, similar to the composition of functions: h(g(f(x))). The resultant matrix of this matrix-matrix multiplication is known as a composite matrix, since applying the linear transformation encoded in this composite matrix will result in the same net linear transformation as applying the linear transformations of C, then B, then A.
Matrix-Multiplication Properties
To explain each of these properties, we first will relate it back to real numbers (scalars).
Commutative
For real numbers, multiplication is commutative. Meaning the order does not matter. In other words, 2 x 5 = 5 x 2.
In contrast, matrix multiplication, in general, is not commutative. Thus, in general, for matrices A and B: A x B is not equal to B x A.
Note: The only time that AB = BA is if it is a matrix multiplied by its identity matrix, which will be discussed later.
If you read the geometric intuition of matrix-matrix multiplication above, then this should make sense. If we apply the linear transformation of B first, then A, the resulting net linear transformation of space will not be the same as applying A first, then B.
Associative
For real numbers, multiplication is associative. In other words, we can multiply regardless of how the numbers are grouped. For example, if we have 2x5x4, we can multiply the 2 and 5 first, then multiply the result by the 4, or we can multiply the 5 and 4 first, then multiply the result by 2. Either way the answer will be the same.
(2×5)x4 = 2x(5×4)
Matrix multiplication is also associative. In other words, if we have the matrices A, B, and C, then:
Ax(BxC) = (AxB)xC
Again, using the geometric intuition of matrix-matrix multiplication, this should make sense. Since in both instances, we are applying the linear transformations from right to left.
Distributive
For both real numbers and matrices, multiplication is distributive. In other words, for real numbers: 2(5+4) = 25 + 24. And for matrices A, B, and C: A(B+C) = AB + AC.
Identity Matrix
When dealing with real numbers, 1 is the identity for multiplication. In other words, for any real number z, the number 1 times z will equal z times 1, which will equal z. So the identity property of 1 means that any real number z multiplied by 1 is equal to z, thus allowing z to keep its identity.

We also have identity matrices. In other words, for any matrix A, there will be an identity matrix, I, that when multiplied with matrix A, equals to matrix A.
AI = IA = A
Identity matrices are square matrices, meaning that the number of rows equals the number of columns. Identity matrices are denoted by I, sometimes as Inxn, with nxn being the dimension of the identity matrix. For a matrix of the dimension mxn, its identity matrix will be the dimension nxn, which should make sense, since for those matrices to be multiplied, the number of columns of the first matrix must equal the number of rows in the second matrix.
Examples of identity matrices:

As we can see above, identity matrices (except for when n=1), have 1’s along the diagonals, and zeros everywhere else.
As discussed before, in general, matrix multiplication is not commutative, unless it is a matrix multiplied by its identity matrix.

Note: Since A is an mxn matrix, then the identity matrix on the left has the dimensions nxn, but in the middle, the identity matrix has the dimension mxm, since the matrix A is first in the matrix-matrix multiplication.
Using geometric intuition, we can think of an identity matrix as not causing a linear transformation. Thus, the order does not matter, since in either scenario (AI or IA), the result will be the linear transformation encoded in matrix A.
Matrix Inverse and Matrix Transpose Operations
There are two matrix special operations that we should familiar with: the matrix inverse and matrix transpose.
Inverse Matrix
We now know that number 1 is the identity in the space of real numbers, since 1 times any real number equals itself. Numbers can also have an inverse. A number, times its inverse, equals its identity.

For example: 2 times 2^-1 equals 1. Remember that 2^-1 is just 1/2. So 2 times 1/2 is just 1. Thus, the inverse of 2 is 1/2, and its identity is the number 1.
But remember, not every number has an inverse. For example, the number 0 does not have an inverse, since 0^-1, or 1/0, is undefined (cannot divide by zero).

A matrix can also have an inverse. If matrix A is an mxm matrix (meaning it is a square matrix, #rows=#columns), then it could have an inverse. Just like with real numbers, we raise a matrix to the -1 power to denote inverse. A matrix times its inverse is equal to its identity matrix.

Note: Matrices that don’t have an inverse, or are non-invertible, are called singular or degenerate matrices.
Why are matrix inverses important?
Let’s say we have the matrix-matrix multiplication of: A*B=C, where we know the values of A and C, but not B. If these were real numbers, we would divide both sides of the equation by A to solve for B. However, we cannot divide matrices. So instead, we can multiply both sides of the equation by 1/A (A^-1), which is the inverse of matrix A. That way, we end up with the following:
AB=C (A^-1)AB=(A^-1)C IB=(A^-1)C B=(A^-1)C
Matrix Transpose
Taking the transpose of a matrix means that the rows of that matrix become the columns. Thus, the first row of matrix A becomes the first column of A^T (the transpose of matrix A), and the second row of matrix A becomes the second column of A^T. Thus, if matrix A is an mxn matrix, then its transpose, or A^T, is an nxm matrix.

We can imagine drawing a 45 degree axis on matrix A, and then rotating (or flipping) matrix A along this axis to get its transpose.
Taking the transpose of a column vector gives us a row vector:

If you enjoy reading stories like these and want to support me as a writer, consider signing up to become a Medium member. It’s $5 a month, giving you unlimited access to stories on Medium. If you sign up using my link, I’ll earn a small commission.
Conclusion
In this tutorial, we learned what scalars, vectors, and matrices are. We learned how matrices are just 2-dimensional arrays, made up of rows and columns, and vectors are just a special case of matrices, since they only have one column. We learned how to perform the mechanics of certain numeric operations, such as matrix-matrix addition and subtraction, scalar multiplication and division, matrix-vector multiplication, and matrix-matrix multiplication. Furthermore, we learned that just like 1 times any real number is itself, we learned that a matrix times its identity matrix is also itself. We also learned that a matrix times its inverse is its identity matrix. We then saw how to take the transpose of a matrix, and viewed that the transpose of a column vector is a row vector. Lastly, we gained a very brief understanding of the geometric intuition behind some of these numeric operations.