Time Series from Scratch

So far, you know everything there is about moving averages – be it simple or exponentially weighted. Now it’s time to cover autoregression, an essential topic for more advanced forecasting models.
We’ll cover the basic theory before implementation, including intuition and a bit of math. You should be familiar with time series in general, so please read the series from the beginning if the topic is new to you. The links to the previous articles are at the end of this one.
The article is structured as follows:
- AutoRegression – Theory and math
- AutoRegression – Implementation in Python
- AutoRegression – Choosing the best parameter value
- Conclusion
AutoRegression – Theory and math
The term AutoRegression (AR) is closely tied to a regular regression from statistics. The only gotcha is that the AR model uses data from the same input variable in a lagged format – hence the Auto part of AutoRegression.
AutoRegression is limited in its forecasting capabilities, just as simple moving averages were. The algorithm uses a linear combination of past values to make future forecasts.
The general AutoRegression model is expressed with the following formula:

Where c
is the constant, phi
‘s are lag coefficient up to order p
, and epsilon
is the irreducible error (white noise).
You only need to specify the value of parameter p
when working with AR models. If p=1
, then the AR model formula simplifies to the following:

It’s that simple!
Higher orders of p
tend to give better forecasting results, but only to a certain point. You’ll see later how to choose the best value for p
automatically. But first, let’s see how to implement AutoRegression with Python.
AutoRegression – Implementation in Python
You’ll create your own datasets today. It’s a simple straight line with a bit of noise added:
Here’s how it looks like:

The next step is to divide the dataset into training and testing subsets. Follow this article if you’re not familiar with the process. You’ll use the last 10 data points for testing and everything else for training:
Here’s how both datasets look like:

Neat! Next, you’ll declare a function for training and visualizing the AR model – train_and_plot(maxlag: int)
. This function is here for your convenience to avoid copy-pasting almost identical code over and over again. It trains the AR(p=maxlag)
model on the training set and graphically compares the forecasts and the test set.
The function also prints the model coefficients in the plot subtitle, so you can tie them to earlier discussed math formulas if you like.
Here’s the code:
You can now use this function to train a simple AR(1) model by executing train_and_plot(maxlag=1)
in a new cell. It displays the following figure:

You can change the parameter p
to anything you want. For example, here’s how the AR(2) model results look like (train_and_plot(maxlag=2)
):

The question remains – what’s the optimal AR model order for this dataset? Let’s answer that in the next section.
AutoRegression – Choosing the best parameter value
Forecasts obtained with AR(1) and AR(2) don’t look all that promising. You’ll always want to optimize the value of p
. One approach would be to plot autocorrelation and partial autocorrelation plots and examine them, but that’s too much work.
A better approach is to train AR(1) to AR(n) models inside a loop and keep track of the performance on the test set. You can use RMSE or any other metric to do so.
Here’s a simple code snippet that does just that:
And here are the errors for AR(1) to AR(10) models:

It looks like the AR(5) model results in the lowest error on the test set. Here’s how the datasets and forecasts look like in this model order:

It’s also common to use the AIC metric for evaluation, as it will favor simpler models over complex ones. Both metrics report that AR(5) is the best model to go with.
Final words
To conclude, you can use AR models to forecast simple datasets. The algorithm works best when combined with moving average models, and that’s a topic we’ll discuss in the following article.
You won’t get good forecasting results if you decide to apply the AR model on datasets like Airline Passengers, regardless of the model order. Making the dataset stationary might help, but the forecasts still wouldn’t be as good as with exponential smoothing.
We’ll explore in the following article if combining AutoRegression and Moving Averages to a single model (ARMA) could help.
Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you:
Read the entire series
- Seeing the Big Picture
- Introduction to Time Series with Pandas
- White Noise and Random Walk
- Decomposing Time Series Data
- Autocorrelation and Partial Autocorrelation
- Stationarity Tests and Automation
- Train/Test Splits and Evaluation Metrics
- Moving Averages (MA) Theory and Implementation
- Exponentially Weighted Moving Averages (EWMA) Theory and Implementation
- Exponential Smoothing Theory and Implementation
Stay connected
- Follow me on Medium for more stories like this
- Sign up for my newsletter
- Connect on LinkedIn