The world’s leading publication for data science, AI, and ML professionals.

Einstein index notation

Einstein summation, index notation and numpys np.einsum

Photo by Hannes Richter on Unsplash
Photo by Hannes Richter on Unsplash

Einstein notation vs. matrix

As a Linear Algebra addict and fan of vectors and matrices, it was unclear for me for a long time, why I should use Einstein notation at all. But When I got interested in backpropagation calculus, I got to a point, where tensors got involved and I then realised that thinking in terms of matrices limits my thinking to 2 dimensions. In this article, I will nevertheless use many matrix and vector analogies, so that the topic becomes easier to grasp.

Free indices

Free indices are indices, which occur on both sides of an equation. For example:

(Image by author)
(Image by author)

𝑣 could now represent a row or a column vector.

(Image by author)
(Image by author)

That’t exactly the point of index notation. You free yourself from any concrete representation of vectors.

We can also have two free indices:

(Image by author)
(Image by author)

We can imagine this equation describing the rows and columns of a matrix 𝐴.

(Image by author)
(Image by author)

However if we continue to increase the number of free indices, it becomes increasingly difficult to image a concrete representation.

With 3 free indices, it would be a tensor and we could try to imagine it as a vector of matrices.

(Image by author)
(Image by author)

Dummy indices

Dummy can occur on one side of an equation and as indices, they occur an even amount of times in every product.

An example would be like:

(Image by author)
(Image by author)

This equation could also be written as an inner product of a row and a column vector.

(Image by author)
(Image by author)

Einstein summation convention

When we use apply this convention, we sum over the dummy indices even if there is no sum symbol. This convention is useful, because the summation over dummy indices happens very often in linear algebra.

Applying this convention, the last equation can be rewritten as follows:

(Image by author)
(Image by author)

Upper and lower indices

Some people apply the following convention and some people don’t. I myself apply it, if I have to convert between index notation and a vectorized form quickly.

Using this convention, we write both lower and upper indices.Please do not confuse the upper indices with "to the power of".

Then only same indices diagonal to each other are summed over.

Example of a repeated index, over which we sum:

(Image by author)
(Image by author)

Another example:

(Image by author)
(Image by author)

Example of a repeated index, over which we don’t sum:

(Image by author)
(Image by author)

Combining free and dummy indices

Most of the times we find free and dummy indices in the same equation. For some readers, this may sound terrifying at first, but after seeing it a few times, I am sure you will appreciate it’s abstractness.

Let’s look at the following equation:

(Image by author)
(Image by author)

We will rewrite it with index notation:

(Image by author)
(Image by author)

You can see here, that the index i is a free index and the index j is a dummy index and get’s summed over.

If you want to use only subscripts you would write it like this:

(Image by author)
(Image by author)

np.einsum

In Numpy you have the possibility to use Einstein notation to multiply your arrays. This poses an alternative to the np.dot() function, which is numpys implementation of the linear algebra dot product.

But np.einsum can do more than np.dot.

np.einsum can multiply arrays in any possible way and additionally:

  • Sum along axes
  • Transpose input and output array

And any possible combination of those operations in any order.

First example of np.einsum

Let’s look at an example first. We create two 2D arrays: A and B.

When now calculate the dot product of A with B:

The output is:

[[ 6 12 18]
 [ 9 18 27]
 [12 24 36]]

We now want to do the same with np.einsum:

The result is the same:

[[ 6 12 18]
 [ 9 18 27]
 [12 24 36]]

But what is going on here? We have to understand the so called signature string: " ik,kj->ij ".

To the left of the arrow we have: ik,kj. This part specifies the indices of the input arrays.

To the right of the arrow we have: ij. This part specifies the indices of the resulting array.

The whole signature string would then mean: " The first input array has the indices ik and the second input string the indices kj. Those indices get transformed into indices ij in the output array".

(Image by author)
(Image by author)

The corresponding math equation would look like this:

(Image by author)
(Image by author)

Second example of np.einsum

Let’s say we have the same two arrays A and B.

We want to multiply them elementwise. In numpy we can do this with:

The result would be:

[[ 1 4 9 ]
 [ 2 6 12]
 [ 3 8 15]]

We do the same with np.einsum:

The resulting array is the same.

[[ 1 4 9 ]
 [ 2 6 12]
 [ 3 8 15]]

The array multiplication was performed in the following way:

(Image by author)
(Image by author)

Third example of np.einsum

We now want the dot product of A with the transpose of B.

We write the same code as in the dot product of A and B, but we switch the indices for B.

The array multiplication was performed in the following way:

(Image by author)
(Image by author)

Fourth example of np.einsum

Okay let’s say we want to perform the same elementwise multiplication, but want to sum over the column axis(axis 1).

np.einsum can do it:

The result looks as follows:

[14 20 26]

Be aware of the fact, that numpy reduced a dimension in the output result and the resulting vector is only 1D.

The array multiplication was performed like this:

(Image by author)
(Image by author)

Conclusion

By now, you should have a basic understanding of how einstein notation and np.einsum work. If this is a new topic to you, then I highly recommed experimenting around with np.einsum, change the dummy indices and the free indices and see how the result changes.

Related articles

https://towardsdatascience.com/backpropagation-in-neural-networks-6561e1268da8

https://towardsdatascience.com/backpropagation-in-neural-networks-6561e1268da8

Want to connect and support me?

Linkedin https://www.linkedin.com/in/vincent-m%C3%BCller-6b3542214/ Facebook https://www.facebook.com/profile.php?id=100072095823739 Twitter https://twitter.com/Vincent02770108 Medium https://medium.com/@Vincent.Mueller Become medium member and support me (part of your membership fees go directly to me) https://medium.com/@Vincent.Mueller/membership


Related Articles