Introduction
Am I the only one who periodically gets confused when dealing with dimensions in NumPy? Today, while reading a Gradio’s documentation page, I came across the following code snippet:
sepia_filter = np.array([
[0.393, 0.769, 0.189],
[0.349, 0.686, 0.168],
[0.272, 0.534, 0.131],
])
# input_img shape (H, W, 3)
# sepia_filter shape (3, 3)
sepia_img = input_img.dot(sepia_filter.T) # <- why this is legal??
sepia_img /= sepia_img.max()
Hey, hey, hey! Why does the dot product of an image (W, H, 3) with a filter (3, 3) is legal? I asked ChatGPT to explain it to me, but it started giving me wrong answers (like saying this doesn’t work) or ignoring my question and started answering something else instead. So, there was no other solution than using my brain (plus reading the documentation, sigh).
If you are also a little confuse by the code above, continue reading.
Dot Product: A Generic Example
From the NumPy dot product documentation (with minor modifications):
If a.shape = (I, J, C) and b.shape = (K, C, L), then dot(a, b)[i, j, k, l] = sum(a[i, j, :] * b[k, :, l]). Notice that the last dimension of "a" is equal to the second-to-last dimension of "b".
Or, in code:
I, J, K, L, C = 10, 20, 30, 40, 50
a = np.random.random((I, J, C))
b = np.random.random((K, C, L))
c = a.dot(b)
i, j, k, l = 3, 2, 4, 5
print(c[i, j, k, l])
print(sum(a[i, j, :] * b[k, :, l]))
Output (same result):
13.125012901284713
13.125012901284713
Understanding NumPy Dot Product Shape
To determine the shape of a dot product beforehand, follow these steps:
Step 1: Consider two arrays, "a" and "b," with their respective shapes.
# Example shapes for arrays a and b
a_shape = (4, 3, 2)
b_shape = (3, 2, 5)
# Create random arrays with the specified shapes
a = np.random.random(a_shape)
b = np.random.random(b_shape)
In this example, array "a" has a shape of (4, 3, 2), and array "b" has a shape of (3, 2, 5). Notice, once again, that the last dimension of "a" and the second-to-last dimension of "b" must match.
Step 2: Take all the dimensions of "a" except the last and all the dimensions of "b" except the second-to-last.
For array "a," we exclude the last dimension (which is 2), resulting in a shape of (4, 3). For array "b," we exclude the second-to-last dimension (which is also 2), resulting in a shape of (3, 5).
Step 3: Concatenate the shapes obtained in Step 2.
By concatenating the shapes using our rule, we get (4, 3, 3, 5). Let’s verify if it is true:
c = a.dot(b)
print(c.shape)
Output:
(4, 3, 3, 5)
As we can see, the resulting shape of the dot product matches our calculated shape (4, 3, 3, 5). Thus, our understanding of the dot product shape is correct!
Clarifying the Dot Product with Sepia Filter for RGB Pixels
Let’s return to the original example with an image (H, W, C) and a filter (O, C), in this case, (3, 3).
Remember that, in the original example, the dot product is with sepia_filter.T, that have shape (C, O). In this case C = O = 3 but if they were different this would be important.
I have to take all the dimensions except the last from the image dimension, in this case, H and W, and all the dimensions except the second to last from the filter dimension, in this case, O. So the resulting dimension is (H, W, O) or, in our case (H, W, 3), still "RGB-like".
Using the NumPy documentation notation:
sepia_filter_T = sepia_filter.T
dot(input_img, sepia_filter_T)[h, w, c] = sum(input_img[h, w, :] * sepia_filter_T[:, c])
Note that this is the same as (removing the transposition from sepia_filter):
dot(input_img, sepia_filter)[h, w, c] = sum(input_img[h, w, :] * sepia_filter[c, :])
But intuitively, how is every RGB pixel in the new image computed? Basically, every channel value of every new pixel (imagine R, "red", at position 4, 2) is a linear combination of the old RGB values of the pixel at the same position, where the weight of this linear combination is the value in the corresponding row in sepia_filter (row index 0 for R, 1 for G, and 2 for B).
Bonus: You can also use einsum for this! (more confusion haha, I know, NumPy is hard):
sepia_img = np.einsum("HWC, OC -> HWO", input_img, sepia_filter)
sepia_img /= sepia_img.max()
plt.imshow(sepia_img)
plt.axis("off")
Output:
Try it and try to understand how it works as an exercise.
Conclusion
Congratulations! You’ve successfully delved into the world of NumPy’s dot product and unraveled its mysteries. By following a simple rule of shape concatenation, you can now easily determine the resulting shape of the dot product for any pair of arrays.
Understanding how dimensions interact empowers you to use the dot product effectively in various image manipulations. For instance, we explored the transformation of an image with the sepia filter, creating beautiful effects through linear combinations of RGB values.
Now armed with this knowledge, you can confidently explore the vast possibilities of NumPy’s dot product in your numerical computations and Image Processing tasks. So, dive in fearlessly, experiment, and let the dot product work its magic!
Thank you for taking the time to read this article, and please feel free to leave a comment or connect with me to share your thoughts or ask any questions. To stay updated on my latest articles, you can follow me on Medium, LinkedIn or Twitter.
Join Medium with my referral link – Mario Namtao Shianti Larcher