
Einstein notation vs. matrix
As a Linear Algebra addict and fan of vectors and matrices, it was unclear for me for a long time, why I should use Einstein notation at all. But When I got interested in backpropagation calculus, I got to a point, where tensors got involved and I then realised that thinking in terms of matrices limits my thinking to 2 dimensions. In this article, I will nevertheless use many matrix and vector analogies, so that the topic becomes easier to grasp.
Free indices
Free indices are indices, which occur on both sides of an equation. For example:

𝑣 could now represent a row or a column vector.

That’t exactly the point of index notation. You free yourself from any concrete representation of vectors.
We can also have two free indices:

We can imagine this equation describing the rows and columns of a matrix 𝐴.

However if we continue to increase the number of free indices, it becomes increasingly difficult to image a concrete representation.

With 3 free indices, it would be a tensor and we could try to imagine it as a vector of matrices.

Dummy indices
Dummy can occur on one side of an equation and as indices, they occur an even amount of times in every product.
An example would be like:

This equation could also be written as an inner product of a row and a column vector.

Einstein summation convention
When we use apply this convention, we sum over the dummy indices even if there is no sum symbol. This convention is useful, because the summation over dummy indices happens very often in linear algebra.
Applying this convention, the last equation can be rewritten as follows:

Upper and lower indices
Some people apply the following convention and some people don’t. I myself apply it, if I have to convert between index notation and a vectorized form quickly.
Using this convention, we write both lower and upper indices.Please do not confuse the upper indices with "to the power of".
Then only same indices diagonal to each other are summed over.
Example of a repeated index, over which we sum:

Another example:

Example of a repeated index, over which we don’t sum:

Combining free and dummy indices
Most of the times we find free and dummy indices in the same equation. For some readers, this may sound terrifying at first, but after seeing it a few times, I am sure you will appreciate it’s abstractness.
Let’s look at the following equation:

We will rewrite it with index notation:

You can see here, that the index i is a free index and the index j is a dummy index and get’s summed over.
If you want to use only subscripts you would write it like this:

np.einsum
In Numpy you have the possibility to use Einstein notation to multiply your arrays. This poses an alternative to the np.dot() function, which is numpys implementation of the linear algebra dot product.
But np.einsum can do more than np.dot.
np.einsum can multiply arrays in any possible way and additionally:
- Sum along axes
- Transpose input and output array
And any possible combination of those operations in any order.
First example of np.einsum
Let’s look at an example first. We create two 2D arrays: A and B.
When now calculate the dot product of A with B:
The output is:
[[ 6 12 18]
[ 9 18 27]
[12 24 36]]
We now want to do the same with np.einsum:
The result is the same:
[[ 6 12 18]
[ 9 18 27]
[12 24 36]]
But what is going on here? We have to understand the so called signature string: " ik,kj->ij ".
To the left of the arrow we have: ik,kj. This part specifies the indices of the input arrays.
To the right of the arrow we have: ij. This part specifies the indices of the resulting array.
The whole signature string would then mean: " The first input array has the indices ik and the second input string the indices kj. Those indices get transformed into indices ij in the output array".

The corresponding math equation would look like this:

Second example of np.einsum
Let’s say we have the same two arrays A and B.
We want to multiply them elementwise. In numpy we can do this with:
The result would be:
[[ 1 4 9 ]
[ 2 6 12]
[ 3 8 15]]
We do the same with np.einsum:
The resulting array is the same.
[[ 1 4 9 ]
[ 2 6 12]
[ 3 8 15]]
The array multiplication was performed in the following way:

Third example of np.einsum
We now want the dot product of A with the transpose of B.
We write the same code as in the dot product of A and B, but we switch the indices for B.
The array multiplication was performed in the following way:

Fourth example of np.einsum
Okay let’s say we want to perform the same elementwise multiplication, but want to sum over the column axis(axis 1).
np.einsum can do it:
The result looks as follows:
[14 20 26]
Be aware of the fact, that numpy reduced a dimension in the output result and the resulting vector is only 1D.
The array multiplication was performed like this:

Conclusion
By now, you should have a basic understanding of how einstein notation and np.einsum work. If this is a new topic to you, then I highly recommed experimenting around with np.einsum, change the dummy indices and the free indices and see how the result changes.
Related articles
https://towardsdatascience.com/backpropagation-in-neural-networks-6561e1268da8
https://towardsdatascience.com/backpropagation-in-neural-networks-6561e1268da8
Want to connect and support me?
Linkedin https://www.linkedin.com/in/vincent-m%C3%BCller-6b3542214/ Facebook https://www.facebook.com/profile.php?id=100072095823739 Twitter https://twitter.com/Vincent02770108 Medium https://medium.com/@Vincent.Mueller Become medium member and support me (part of your membership fees go directly to me) https://medium.com/@Vincent.Mueller/membership