
Rolling or sliding calculations are crucial in time series analysis. From financial to epidemic analysis, the odds are you will need to perform moving window computations, so it is paramount to learn how to do them and do them well.
This story will also explore the basics of rolling computations in Numpy and its limitations; it will also include some recipes for everyday use cases.
In Pandas, it is very straightforward to perform rolling computations, perhaps more so than in NumPy. In NumPy, the utility to perform this task is somewhat hidden; maybe this is why it is not very common to see NumPy code in time series rolling.
A natural question arises: why use NumPy if we can easily do it with Pandas? In this story, we will compare Pandas vs. NumPy regarding rolling windows. I love Pandas, but sometimes we need to go a level lower into NumPy to get more granularity in how we handle data.
Doing rolling calculations on vectors (1D arrays) is very straightforward, both in Pandas and in NumPy, but performing rolling calculations on matrices is more challenging; this is why we need NumPy’s granularity.
Story structure
- Sliding window -> add extra dimension
- For loop vs. NumPy
- Rolled array memory profile
- Function: 1D array -> scalar
- Function: multiple 1D arrays -> 1D array
- Pandas vs. NumPy
- Rolling functions in 2D arrays
- Rolling least squares coefficients for multiple regressors
- Rolling least squares R-squared for multiple regressors
- Rolling linear combination of 2D arrays
- Moral of the story
Sliding window -> add extra dimension
![Rolling mechanism [Image by author].](https://towardsdatascience.com/wp-content/uploads/2022/07/1bESjSnmsi8xEf-U_qvuqFA.png)
NumPy’s rolling window solution is to create another array with an extra dimension. Such array contains the rolled original array at the specified sliding window on each of the indices of the additional axis. The utility is somewhat hidden, as you may tell by the number of dots in the import:
np.lib.stride_tricks.sliding_window_view
For example, let’s create a 1D array with 1,000 elements:
(1000,)
then we roll it with a window of 100:
(901, 100)
as you can see, the shape’s dimension increased by one, and the size of this new dimension is precisely the window.
It works the same for arrays with higher dimensions, for example, for a 2D array with 1,000 rows and 10 columns:
(1000, 10)
if we roll it along the rows:
(901, 10, 100)
For loop vs. NumPy
Since sliding (rolling) computations are most useful in processes indexed by time, the first programming control flow statement that comes to mind is the for
loop.
NumPy is known for its performance relative to pure Python. So let’s compare the time it takes to traverse the array with a rolling lookback window using a for
loop vs. NumPy’s sliding_window_view
:
37.2 µs ± 4.06 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
297 µs ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
NumPy is known for being at least an order of magnitude faster than Python when used properly. This test is no exception.
Rolled array memory profile
A natural question arises:
If an extra dimension is added to the array with the size of the rolling window, wouldn’t this be highly memory inefficient?
The answer is NO because the new array is actually a view of the original array; hence there are no duplicates in memory.
Let’s create the two arrays and check if they share memory
True
of course, they do.
However, the new rolled array has a slightly higher memory usage, as some new data is created for the view
itself. But how much higher is the memory usage?
Let’s code a small test using the Pympler memory profiler. We will change the size of the original array through the arguments size_lb
(array size lower bound), size_hb
(array size higher bound), and the size_steps
to control the number of trials. The size of the window is specified through the window
argument:
The returned dataclass
has the array memory usage and the rolled array usage for each size. The results are summarized in the following plot for the memory usage ratio:
![Memory ratio plot: rolled array / original array [Image by author].](https://towardsdatascience.com/wp-content/uploads/2022/07/1V4ha52V3xNiNnJw114S17A.png)
We can see that for smaller arrays, the overhead of the rolled array view is substantial, but it converges to unity as the array grow larger.
Function: 1D array -> scalar
![A function:1D array -> float, when rolled produces another 1D array [Image by author].](https://towardsdatascience.com/wp-content/uploads/2022/07/19Q24PPe6XYWmsRCmxjrbcQ.png)
Performing rolling computations in a 1D array is the simplest case. Intuitively we want to map a function that takes the lookback array and returns a scalar; that way, a new 1D array is produced. The length of the new array is:
the lenght of the original array – the window size + 1
Once we have the rolled array (with the extra dimension), we need to apply a NumPy function along the last axis (the rolled axis); to do that, we have two options:
- use NumPy built-in functions
- use a NumPy
ufunc
with thereduce
method in the rolling axis
Check out this story if you want to know more about ufuncs
and how to create your own truly vectorized ufuncs
.
The function used during the computation has to be optimized for arrays; that is the only way to take advantage of NumPy’s performance. Using NumPy’s utilities like apply_along_axis
will not result in a performance boost.
A simple example is to compute the rolling standard deviation. Let’s first create an array with samples from a standard normal distribution and then roll the array. We can leave the resulting array as is or back-fill the missing indices:

Function: multiple 1D arrays -> 1D array
![A function: 2D array (multiple 1D arrays) -> 1D array (multiple floats), when rolled produces another 2D array [Image by author].](https://towardsdatascience.com/wp-content/uploads/2022/07/1_DgkTllZg2kekTxtho4IPg.png)
We can also do a rolling computation on multiple vectors simultaneously. These vectors may be arranged in a 2D array, but the rolling calculation is not on a rolling matrix _per se. A_ll columns (rows) are treated the same way; there is no distinction between column (row) indices as you would expect on a proper matrix roll. It is how vectorization works.
For example, we can create a 2D array (each column a process) with standard normal samples and then compute the rolling standard deviation on all the columns at once:

It turns out that proper matrix rolling, where we can combine columns (rows) in a specific way, is way more tricky, as we will see in subsequent sections.
Pandas vs. NumPy
Pandas has a very convenient way of rolling series and data frames. I find it more intuitive and easier to use than NumPy’s. But it has its shortcomings, as we will see.
How do Pandas rolling compare to NumPy’s in terms of speed? Let’s do calculate the rolling standard deviation and find out:
258 µs ± 20.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
262 µs ± 8.16 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
As you can see, the speed performance is about the same. So it is a tie for built-in calculations.
What if we want to apply a custom function? We can use the apply
method in pandas and the apply_along_axis
in NumPy to use our function that takes a 1D array (series) and returns a float:
5.31 ms ± 386 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.64 ms ± 693 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Again they are comparable in speed.
But there is one caveat with the last test; while using NumPy, if we can, we should avoid using apply_along_axis
because it is not optimized for speed. To unleash NumPy’s performance, we need to use compiled, true vectorized, ufuncs
under the hood of the functions we use on the rolled array. Check this story to learn all about ufuncs
.
For example, if we do the same calculation in a vectorized way:
79.8 µs ± 3.77 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Impressive boost.
We could argue that we can apply compiled vectorized functions and ufuncs
in Pandas. However, getting lost within Pandas rolling workflows is easier. Often becomes confusing what is vectorized (fast performance) and what is not (slow). While NumPy makes it very clear.
What NumPy offers is simply a rolled array view with an extra dimension, and it is up to us to do with such an array what we require. If you are doing non-vectorized operations or using for loops on NumPy arrays, you know that you are doing so at your own peril.
Rolling functions in 2D arrays
![Rolling 2D arrays into scalars [Image by author].](https://towardsdatascience.com/wp-content/uploads/2022/07/1-lAmqGa15xt944QXP0gxaA.png)
As we pointed out earlier, rolling matrices (2D arrays) is tricky. There is no simple solution for applying a user-defined function on rolling matrices in NumPy or Pandas. As discussed in a previous section, rolling multiple 1D arrays simultaneously (technically a 2D array) is not the same as rolling a matrix.
An example of rolling a matrix would be performing a rolling least squares regression of multiple regressors (matrix with multiple columns) and on each row index (can think of this axis as time) get the R-squared metric or all the regressor coefficients. Problems I face often. In the former case, a 2D array would produce a 1D array when rolled, and the latter case, the 2D array would create another 2D array when rolled.
There are ways to do this, but there is no one-size-fits-all solution. There is no way to apply an arbitrary, possibly pure Python function and expect it to work and be fast. Instead, we need to be able to produce an algorithm that can leverage one or multiple compiled and vectorized operations to manipulate the rolled array. More often than not, it requires some math besides NumPy’s tools.
In the following sections, we will review some recipes of rolled 2D arrays I use often.
Rolling least squares coefficients for multiple regressors
To do a rolling least squares regression, we first need to know the math in matrix form. A linear model of the form:

where ε is i.id., can be fitted through many samples of y and the x‘s. Taking all the samples into account produces a matrix equation for the model specification:

where y, β, and ε are column vectors of size equal to the sample size, and the T superscript denotes transpose. X is a matrix with n columns, and the number of rows equals the sample size. If we want to include an intercept different than zero, the matrix X has ones in its first column.
Then the OLS (ordinary least squares) estimation for β is:

Using this equation, we can do our properly vectorized rolling. But first, let us do it the naive way, using a for
loop to do the rolling. Then we will implement the vectorized version and compare the performance.
Our toy data will be an array of 1,000 x 10 of standard normal samples:

The function we need to apply to get β at each index of the sliding window is:
Then we code a generic rolling using a for
loop that uses a static function to be rolled of type Func4Loop
:
If we pass our function, we can create a matrix where each row is a β vector:
Now let us implement the vectorized version. It is a bit more complex than the for
loop, but it is worth the effort.
- The first function
_get_X_n_y_rolled
adds a constant to X if required and generates the rolled array views. - The second function
_get_beta_vec_from_Xy
performs the rolled matrix multiplications vianp.einsum
to have granular control on the product, the t axis (the first axis) is preserved in all products. - The last function merely merges the logic of the previous functions.
Calling our function, we create our rolling β matrix, vectorized:
Both methods produce the same result:
True
albeit some minimal numeric differences.
Now the good part, let’s test the speed:
314 ms ± 141 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
56.4 ms ± 2.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The vectorized version was worth the effort, indeed.
Rolling least squares R-squared for multiple regressors
Getting the rolling R-squared requires a bit more math. Remember that this metric can be expressed as:

where _y_c is the y vector centered (demeaned) and the vector e_ is the residual vector:

As in the previous section, we start by doing the roll first with a for
loop. To wit, we define the static function to be applied at each iteration of the loop:
and borrowing the get_rolling_func_loop
from the last section, we compute the rolling R-squared:
For the vectorized version, we also borrow functions from the previous section, _get_X_n_y_rolled
and _get_beta_vec_from_Xy
. Like the last section, we use np.einsum
to do the rolling matrix multiplications to preserve the first axis (the t axis).
As you can see, both calculations are equivalent:
True

And the speed:
958 ms ± 397 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
73.7 ms ± 2.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
As always, the vectorized version does not disappoint.
Rolling linear combination of 2D arrays
![Rolled linear combination (column-wise) of 2D arrays [Image by author].](https://towardsdatascience.com/wp-content/uploads/2022/07/1IJbu_eD4rzjwwhsPfc3PuA.png)
The last recipe I want to share with you is a rolling linear combination. It is extremely useful in the context of cointegrating vectors in Time Series Analysis.
At each point in time, we have a vector of weights (w_vec
) to form a linear combination of multiple time series. So in practical terms, we have two matrices with the same number of columns, one for the weights (w_mat
) and another one for the time series (each column one series).
Each w_vec
is found in a cointegration setting using a lookback window. Hence, a rolling linear combination allows us to analyze the evolution of the cointegrating vector algorithm.
For our toy data, we create an array (a )
that contains the time series and w_mat,
that contains the linear combination coefficients (the columns are indexed corresponding to a 's
columns).
We then create the rolled array (a_rolled
) and perform the product (a_prod_w_rolled
). This product is 2D; each row contains the rolled linear combination time series:
Finally, we can apply a function to the last axis; we used the standard deviation for demonstration purposes.

To be clear. The function is applied to a rolling linear combination of changing coefficients and processes over time.
Moral of the story
There is no single moral for this story but rather four:
- There is no magical way to apply a generic function to perform fast vectorized rolling calculations, especially when we are rolling 2D arrays and the order of columns (rows) matters. We need to pause and design an algorithm that can leverage the performance of vectorized code and accomplish our functionality goal.
- Sometimes NumPy offers a granularity lacking in Pandas.
- When using NumPy, we must be careful not to mix compiled (vectorized and optimized) code with pure Python code; otherwise, we will suffer a performance hit. And if we absolutely must, the boundaries should be as clean as possible.
- When you need a custom product between arrays use
np.einsum
.
I hope this story was useful for you. Subscribe to my email list if you’d like more stories like this.
Liked the story? Support my writing by becoming a Medium member through my referral link below. Get unlimited access to my stories and many others.