What is an ARIMA Model?

Taking a quick peek into ARIMA modeling

Miranda Auhl
Towards Data Science

--

If you work with time series data, then you likely have heard the term ARIMA. The ARIMA model has been used for analyzing time series data since the 1970s, and there are good reasons that it has stuck around; it is simple and powerful. In this blog post, my goal is to give you a solid foundation for understanding this model and hopefully encourage you to use it for analyzing time series data.

General Concept

The ARIMA model (an acronym for Auto-Regressive Integrated Moving Average), essentially creates a linear equation which describes and forecasts your time series data. This equation is generated through three separate parts which can be described as:

  • AR — auto-regression: equation terms created based on past data points
  • I — integration or differencing: accounting for overall “trend” in the data
  • MA — moving average: equation terms of error or noise based on past data points

Together, these three parts make up the AR-I-MA model.

The AR and MA aspects of ARIMA actually come from standalone models that can describe trends of more simplified time series data. With ARIMA modeling, you essentially have the power to use a combination of these two models along with differencing (the “I”) to allow for simple or complex time series analysis. Pretty cool, right?

Caveats to the Model

Before digging deeper, I do want to note that the ARIMA model functions under some assumptions. In order to use the ARIMA model effectively, you will want to ask yourself these questions about the time series data you wish to analyze.

  • Is there known seasonality (cyclical trends)?
  • Are there a lot of outliers or sporadic data points?
  • Is the variation of the data about the mean inconsistent?

If you answered no to these questions, then the ARIMA model is for you! Otherwise, you will likely have to look for a different time series model.

Getting to the Details

The ARIMA model is almost always represented as ARIMA(p, d, q) where each of the letters corresponds to one of the three parts described above. These three letters represent parameters that you will have to provide, and are described as follows:

  • p determines the number of autoregressive (AR) terms
  • d determines the order of differencing
  • q determines the number of moving average (MA) terms

While I love the mathematics behind these parameters, I will refrain from explaining it within this post. If you are interested in how exactly these equations work, I highly recommend checking out the resources I posted at the end. For now, I will just try to give you a general understanding of these three parameters.

Integration

Let’s begin by looking at the “I” piece of our ARIMA model. This part of the model accounts for general trends that occur throughout the time series data. The d value refers to how many times you would need to take the derivative of your time series trend to get a flat line (or constant).

For example, the following graph shows actual data for the average land temperature in April from 1990 to 2015. Notice the linear trend.

Graph showing time on the x-axis and average temperature on the y-axis. The data displayed is the average land temperature in April over the last 25 years. The data shows an increasing trend in temperatures over time.

If we were analyzing this data with an ARIMA model, we would likely use d=1 to account for its linear trend. If the trend were quadratic, we would probably have to use d=2.

Auto-Regressive and Moving Average parts

The ARIMA model is recursive in nature and thus relies on past calculations. This recursive nature comes directly from the AR and MA equation terms that are added to the model.

The p value, or AR part, essentially describes how reliant your data points are on past data points. If p=1 then the model’s output for a specific time relies directly on what the output was for the time before. If p=2, then the output would rely on the outputs from the last two time periods.

Similarly, the q value, or MA part, uses the same recursive concept. The difference is that q describes how related your current output is to its past error or noise calculations. So, if q=1, then your current output would rely on the past time period’s noise calculation. For q=2, your output would rely on the noise from the last two time periods.

Conclusion

Now that we have a general understanding of ARIMA modeling and its’ parameters, we can actually look at how to use the model for analysis. Finding out the right p, d, and q values can be challenging, but having the right tools, such as ACF and PACF, can help. In another post, I’ll go through a full example of how to analyze time series data using these tools and show how to find the p, d, and q values together. Until next time!

References:

  • Pennsylvania State University STAT 510 Online course; Applied Time Series Analysis; https://online.stat.psu.edu/stat510/lesson/1; 2021
  • Time Series: Autoregressive models AR, MA, ARMA, ARIMA; Mingda Zhang; University of Pittsburgh; 2018.

--

--

Developer Advocate at Timescale • LOVE software development • background in Mathematics and Secondary Math Education • public learning