
You can find all the code at the end. All equation-images are made by the author.
Remember to follow me using the "Follow" button on the right → : you’ll get notified for new articles, and help reach my goal of 100 followers 🙂
I recently came across this post about solving a 2D partial differential equation using a finite-difference method. I found this post to be a great introduction to Finite-Difference Method (FDM): if you use numerical methods, make sure to check it out. It is well explained and uses a simple example so it is easy to follow.
But when I read the code I realized there were some improvements that could be made to speed up the resolution, especially since the FDM is supposed to be "slow" or a "computation-extensive" method.
I will show you how just a few lines of num-pythonic code improved the resolution time by a factor of 300!
Quickstart on the equation
The heat equation is basically a partial differential equation that mixes time and space – that reverted-squared-triangle is just a fancy notation for "sum the double derivative in each direction":

with alpha a (diffusivity) constant. So in 2D we can write more explicitly:

Now this kind of equation does not have an analytical solution so we solve it numerically. To do so, we can use a finite-difference method: this method simply consists in approximating the derivatives using a "slope" expression. For example, the time derivative:

So with finite-difference notation, we can rewrite the 2D heat equation: we use k to describe time steps, i and j to describe x and y steps:

Notice that the second-order derivative (for x or y in the above) uses samples "before" (i-1 or j-1) and "after" (i+1 or j+1) the position (i,j) of interest.
To simplify, let’s assume we use the same sampling in each direction: Delta x = Delta y. The equation simplifies and we can express the temperature at time k+1, using only temperature values at time t, in several space-positions:

Now that we have the equation that gives the temperature at time "k+1" from samples at time "k", we can compute each "frame" recursively.
First method : Think of image convolution!
Notice that the equation to compute the temperature at time k+1 is a linear combination of other temperature points at time k. So this relation can be seen as a linear operation, and so it can be written with as a convolution.
If that doesn’t make sense, here is another formulation: picture the heat map at time k as a 2D image, and the heat map at time k+1 as another image, that is the result of a convolution of the first image using a specific kernel.
This concept is also well described in the original post using the stencil representation.
Rewriting the equation with :

we get the expression of the temperature at position (i,j) at time (k+1) :

This expression can be written as a convolution between a kernel (that does not depend on u), and the temperature map at time k. The kernel can be written as :

with local – centered in position (i,j) – and current – at time k – heatmap :

Notice that this 3×3 matrix is just the local 9 samples surrounding the main sample of interest – it’s a "crop" of the temperature image at time k.
And so the expression of temperature at the next time step "k+1" is just the "sum-product" of these 2 matrices, also call tensor-product or dot-product:

Code
Based on the code of the original post, we initialize a 3D temperature map, where the first dimension is time, and second and third dimensions are spatial dimensions. We set the number of samples in each direction as parameters to easily improve the simulation later.
Now let’s see the code used to propagate the temperature along each dimension :
It is no secret that python "for loops" should, most of the time, be avoided. Especially when nested. So we are going to rewrite the above function using faster pythonic notations.
Back to the equation :

Notice how everything on the right side of the equation only depends on the kth-time heatmap. We have a k-loop to propagate along the time direction, and i and j loop to set the temperature at each space position. Now we want to rewrite this function using Numpy and a "kernel-convolution approach".
By nature, the finite-difference method propagates the solution from time k to time k+1, so we have to keep the outmost loop : the k-loop. But the 2 inner loops can be simplified a lot.
Remember the above dot product : with a sum-product operation, we can compute the temperature at time k+1 for a position i,j. But numpy allows doing this for all space positions, all at once! So instead of doing Nx*Ny operation, we only do one !
To do this, we need a sliding window – a common tool when dealing with convolution, provided by the function np.lib.stride_tricks.sliding_window_view(arr, shape)
: this function uses stride views on the input array to return a view of each "local" window of the given shape.
Just to make the sliding window view more explicit:
The resulting array has shape (2, 4, 3, 3):
- 2 elements verticaly because we can only fit two 3×3 window vertically
- 4 elements horizontaly because we can only fit four 3×3 window horizontally
- and each window is of shape 3×3
Now that we have each local window, we also need the kernel : since the gamma coefficient is already defined, we can immediately define the kernel:
Finally, we wrap everything in a function to replace the original one:
For all samples within the bounds, the sliding window function returns a 3×3 array, that is used to multiply the kernel, and then summed to finally compute the temperature at time k+1.
Now that we have rewritten the function, let’s make sure that they output the same result :
which outputs True! Finally, we can estimate the speed improvement using the ipython magic command "timeit":
So we went from 10 seconds, to only 0.2 seconds !
So the computation time was greatly improved ! Finite-difference is slow, but it is not that slow.
What if…
What if I told you we can get a factor 10 more on computation speed, with even simpler code ? Well, we can, using simple indexing.
Again, remember the base equation for the sample located at (i,j):

This equation is true for all samples within the bounds (so not the ones on the first line, or last line, or first row, or last row – because they don’t have a full neighbourhood).
Also notice that there are 5 "u-variables", namely:

Let’s write :
A = u[ k, 2: , 1:-1]
to represent u_{i+1,j}^kB = u[ k, :-2, 1:-1]
to represent u_{i-1,j}^kC = u[ k, 1:-1, 2: ]
to represent u_{i,j-1}^kD = u[ k, 1:-1, :-2]
to represent u_{i,j+1}^kE = u[ k, 1:-1, 1:-1]
to represent u_{i,j}^k
where the "1:-1" are here to leave the bounds untouched, and k is there because we only use samples at time k to compute samples at time k+1. This way we can write that…

And now, the operation of weight-averaging all samples with its neighborhood is done in one operation, for all samples :
Let’s make sure it outputs the same results as before :
which outputs True. And more interestingly, let’s bench this method :
So we I get about 30ms, where we got about 10s with the nested-loop, that is a gain by a factor of 300 ! (and about 7 times faster that the convolution method)
Wrap up
Here are the important things to remember from this post :
- Check out the original post.
- Convolution : if the equation of the "new" value is a linear combination of older values, like with the stencil pattern, the new values can be expressed using convolution: think of image convolution. This technique allowed for a great improvement compared to the nested loops, but it is still less efficient than indexing.
- Use numpy indexing anytime you can : it is WAY more efficient and WAY more pythonic.
Since we got a great speed improvement, we can increase the number of samples to refine the simulation :


Here’s the full code :