
Motivation
Time series analysis becomes increasingly important, especially when it comes to the uncertainty on its data, for example like stock price and cryptocurrency price. That’s why a proper time series analysis is needed to make a good model performance – and thus, good decision – from the data. Here I will show you some things that you should check before creating a univariate time series model to make your model performs better.
Before we jump further, I would like to tell you that the univariate time series model here means that there are only one dependent variable on the time series model, while multivariate time series model here means that there are multiple dependent variables on the time series model.
Roughly said, there are three steps that you have to check before you can advance to time series modeling, which I will explain in every step.
Step 1: Is your data stationary?
The first step that you have to do is to check whether your data is stationary or not. Stationary here means that the time series data properties such as mean, variance, autocorrelation, etc. are all constant over time. This is important because most of the forecasting methods in time series analysis are based on the assumption that the time series can be rendered approximately stationary through the use of mathematical transformations.
There are two approaches to check your data stationarity, the first one is the Exploratory method, **** where we check the data stationarity based on the data pattern. There are four types of data patterns in time series analysis, the list is as follows.
Type 1: Stationary pattern
The data are said to have a stationary pattern if the data pattern is stationary on mean and/or variance. The example of a stationary data pattern can be seen in the image below.

Type 2: Trend-effect pattern
The data are said to have a trend-effect pattern if the data pattern created is having an increasing or decreasing pattern continuously. The example of a trend-effect data pattern can be seen in the image below.

Type 3: Seasonal-effect pattern
The data are said to have a seasonal-effect pattern if the data pattern created is having a repeated pattern with a certain period, e.g. annually, monthly, weekly, or daily. There are two types of seasonal-effect pattern, the first is the additive seasonal-effect, while the second is the multiplicative seasonal effect. Take a look at these images below to see the differences.


Type 4: Irregular pattern
The data are said to have an irregular pattern if the data pattern created doesn’t follow the first three types. The example of an irregular data pattern can be seen in the image below.

Based on the exploratory analysis, if your data pattern follows the second, third, or fourth type, then your data are not stationary. To confirm this, we need to conduct the Augmented Dickey-Fuller test (ADF test). This test is done by testing the hypothesis as follows.
- Null hypothesis: There are exist the root unit on the data (i.e. the data are not stationary)
- Alternative hypothesis: No root unit existed on the data (i.e. the data are stationary)
The null hypothesis is rejected if the ADF test statistics score is lower than the critical error score (i.e. the P-value is lower than the significance level), where the hypothesis is tested on the regression equation as follows.

In R, we can conduct this test by writing these lines of code. Here I use the closing price of AAPL stocks data (March 2020-March 2021) from Yahoo! Finance for the example.
library(tseries)
library(lmtest)
library(car)
library(MASS)
library(tidyverse)
#AAPL Stocks, you can download the .csv data from Yahoo! Finance
data=read.delim('clipboard') #copy the data from .csv, then run this
tsdata=ts(data$Close)
ts.plot(tsdata,col="blue",main="AAPL Closing March 2020 - March 2021")
adf.test(tsdata)
We can see the result as follows.
> adf.test(tsdata)
Augmented Dickey-Fuller Test
data: tsdata
Dickey-Fuller = -1.6893, Lag order = 6, p-value = 0.7066
alternative hypothesis: stationary
Based on the ADF test, we can see that the P-value are bigger than the significance score (0.05), so we can conclude that the data are not stationary. To handle this, we need to conduct differencing on our data, where it can be written by the equation

Where X_t denotes the t-th period of original data, while the Y_t denotes the t-th period of differenced data. Note that if we want to do a higher order of differencing, just change the X_t with the first-order differenced data and Y_t as the second-order differenced data, and so on. We can do this in R by running these lines of code.
diff1=diff(tsdata,differences=1)
ts.plot(diff1,col="blue",main="AAPL Closing March 2020 - March 2021")
adf.test(diff1)
We can see the result as follows.

> adf.test(diff1)
Augmented Dickey-Fuller Test
data: diff1
Dickey-Fuller = -6.5979, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
Much better now! We can see that the P-value are smaller than the significance level (0.05), so we may conclude that the first-order differenced data are stationary.
For those who want to learn more about the ADF test, you can learn more from [2].
Step 2: Is there any exogenous variable available?
If your time series data contains the exogenous variables, then I suggest you to include some – if not all – them in your model, to give you a piece of better information gained from your model. The example of the model that includes the exogenous variables is ARIMAX (ARIMA with eXogenous variable), ADL, etc.
Step 3: How much is your data?
If your data are pretty small (say, hundreds of observation periods in your data), then traditional statistics models— like ARIMA, SARIMA, ARCH-GARCH, etc. and their modifications – will be enough for good model performance, since the model equation that will be generated from that data (and the data that are used to train your model) won’t be too complex.
On the contrary, if your data are very big, then I will suggest you to use data-driven models – like machine learning models, neural network models, etc. – to help you reach better model performance. You can still use the traditional statistics approach, but the performance (in most cases with this data) is not good enough if we compared it with the data-driven approach.
Conclusion
And that’s it! By checking those things on your time series data, you are ready to model your time series data and you can reach better performance on your univariate time series modeling process! Feel free to ask me and/or discuss via my LinkedIn if you have any questions.
See you in my next article!
Reference
[1] William W. S. Wei, Time Series Analysis: Univariate and Multivariate Methods (2006), Pearson Education
[2] Wayne A. Fuller, Introduction to Statistical Time Series, 2nd Edition (1995), Pearson Education