The world’s leading publication for data science, AI, and ML professionals.

Forecasting Time Series Data – Stock Price Analysis

This article is focused on forecasting the Time-series data using Python. In this article, we will be going through the stock prices of a…

Time Series Forecasting

Focused on forecasting the Time-series data using different smoothing methods and ARIMA in Python.

Photo by lo lo on Unsplash
Photo by lo lo on Unsplash

In this article, we will be going through the stock prices of a certain company. However, this article does not encourage anyone to trade ONLY based on this forecast. Stock prices are dependent on various factors like supply and demand, company performance, the sentiment of the investors, etc.

What is Time-Series?

Time Series comprises of observations that are captured at regular intervals. Time Series datasets have a strong temporal dependence. It can be used to forecast future observations based on previous ones.

Decomposing the Time Series:

The Time-Series can be divided into several parts as follows:

Trend: The increase or decrease in the value of the data. This can further be divided into the global trend and local trend.

Actual Observations = Global Trend + Local Trend (Source: UpGrad)
Actual Observations = Global Trend + Local Trend (Source: UpGrad)

Seasonality: It is the repetitive pattern that is visible in the series. This rise or fall in the values of the data is of fixed frequency. For example, the sales of Christmas trees are always greater during December and less for the rest.

Cyclicity: It is the rise and fall in the value of the data which is not of a fixed frequency. It can be seen as an outcome of economic conditions or other external factors

Noise: It is just some random data that we obtain after extracting the trend and the seasonal components from the time series.

The components of the time series can be an additive or multiplicative version. The multiplicative model is preferred when the magnitude of the seasonal pattern increases or decreases with the increase or decrease in the data values. The additive model is preferred when the magnitude of the seasonal pattern does not correlate with the data values.

Additive and Multiplicative Seasonality - Source: kourentzes
Additive and Multiplicative Seasonality – Source: kourentzes

In the additive version, the magnitude of the seasonal pattern remains to be uncorrelated with the rising trend whereas it is increasing with the increase in the trend of the time series in the multiplicative version.

How to obtain stock price data?

We can obtain stock prices by creating an API request using the requests package. We can also use the quandl package which is designed to provide financial and economic data.

We will extract the prices of US stocks from alphavantage website by creating an API call. Using the code snippet below, we have extracted the daily stock prices of IBM. You can refer to this link to alter your request needs.

For using quandl package to import data into python, you might need to install quandl package. Enter the command !pip install quandl in your Jupyter notebook and you’re good to go. We have imported the prices of Infosys (BOM500209) and will use these for our further analysis. More documentation on quandl and how to get the best out of it can be found here.

For both the methods, you will need to create an API key which you will use to obtain the data. You can easily get your API by clicking these links: alphavantage and quandl.

Now that we have got the data, we need to plot it to get an overview of how the trend looks like.

The trend experiences some ups and downs as a stock generally does. It is not seasonal as the seasonal component does not give any clearer picture. The residuals’ variance seems to remain the same except for a few observations.

Checking for Stationarity

For ARIMA, time series has to be made stationary for further analysis. For a time series to be stationary, its statistical properties(mean, variance, etc) will be the same throughout the series, irrespective of the time at which you observe them. A stationary time series will have no long-term predictable patterns such as trends or seasonality. Time plots will show the series to roughly have a horizontal trend with the constant variance.

Rolling Statistics

We can plot the rolling mean and standard deviation to check if the statistics show an upward or downward trend. If these statistics vary over time, then the time series is highly likely to be non-stationary.

ADF and KPSS Test

To check the stationarity of the time series, we will also use the ADF (Augmented Dickey-Fuller) test ** and KPSS (Kwiatkowski–Phillips–Schmidt–Shintests)** test. The null hypothesis of the ADF test is that the time series is not stationary whereas that for the KPSS is that it is stationary.

We can see the rolling mean trending up and down over time as the price of the stock increases and decreases respectively. The p-value of the ADF test turns out to be greater than 0.05 which means we cannot rule out null hypothesis. The p-value of the KPSS test is below 0.05 which means we reject the null hypothesis. All these tests conclude that the time series is not stationary.

How to de-trend the time series?

Differencing: A new series is constructed by calculating the value at the current time by differencing the value of actual observation of current time and its previous time.

value(t) = actual_observation(t) – actual_observation(t-1)

Transformation: Transforming the values using power, square root, log, etc can help to linearize the data. For example, taking a log of the values can help in obtaining a linear trend to the series with an exponential trend. log(exp(x))=x

Seasonal Differencing: The values of the time series are calculated by differencing between one observation and its previous Nth observation. This can help in removing the trend

value(t) = actual_observation(t) – actual_observation(t-N)

Fitting a model: We can fit a linear regression model to the time series. It will fit a linear trend on the time series. The values for the de-trended time series can be calculated by subtracting the actual observations with the values predicted by the model.

value(t) = actual_observation(t) – predicted(t)

After de-trending the time series, ADF and KPSS tests indicate that the time-series is stationary. Partial AutoCorrelation Function (PACF) Plot suggests that correlation exists at certain lags.

Splitting the Dataset

Train - Test Split
Train – Test Split

Model Building

To forecast the prices, we can use smoothing methods and ARIMA methods. Smoothing methods can be used for non-stationary data whereas ARIMA requires the time series to be stationary. We can also make use of _autoarima, which makes the series stationary and determines the optimal order for the ARIMA model.

For each of the methods, we will perform multiple fits for the optimization of the hyperparameters and use the optimal values for the final model.

Simple Exponential Smoothing

Simple Exponential Smoothing or SES is used when the data does not contain any trend or seasonality. Smoothing Factor for level (α) provides weightage to the influence of the observations. Larger values of α mean that more attention is given to the most recent past observation whereas smaller values indicate that more past observations are being considered for forecasting.

Simple Exponential Smoothing
Simple Exponential Smoothing
Simple Exponential Smoothing Forecast
Simple Exponential Smoothing Forecast

Holt’s Exponential Smoothing

Holt’s Exponential Smoothing takes the trend into account for forecasting the time series. It is used when there is a trend in the data and no seasonality. It calculates the Smoothing value (the first equation), which is the same used in SES for forecasting. Trend Coefficient (β) provides weightage to the difference in the consequent smoothing values and the previous trend estimate. The forecasting is a combination of the smoothing value and the trend estimate.

Holt's Exponential Smoothing
Holt’s Exponential Smoothing
Holt's Exponential Smoothing Forecast
Holt’s Exponential Smoothing Forecast

Holt-Winters Exponential Smoothing

Holt-Winters Exponential Smoothing takes trend as well as seasonality into account for forecasting the time series. It forecasts the values using equations for calculating the level component, trend component, and the seasonal component in the time series. According to the seasonal variations in the data, either additive or the multiplicative version is used.

Additive Method (Source: otexts)
Additive Method (Source: otexts)
Multiplicative Method (Source: otexts)
Multiplicative Method (Source: otexts)

Auto-Regressive Integrated Moving Average (ARIMA)

ARIMA model is a combination of Auto-Regressive model and Moving Average model along with the Integration of differencing. Auto-Regressive model determines the relationship between an observation and a certain number of lagged observations. The Integrated part is the differencing of the actual observations to make the time series stationary. Moving Average determines the relationship between an observation and residual error obtained by using a moving average model on the lagged observations.

Auto-Regressive (p) -> Number of lag observations in the model. Also called as the lag order.

Integrated (d) -> The number of times the actual observations are differenced for stationarity. Also called as the degree of differencing.

Moving Average (q) -> Size of the moving average window. Also called as the order of moving average.

ARIMA Forecast
ARIMA Forecast

Summary

To evaluate the performance of the model, we will use Root Mean Squared Error (RMSE) and compare which model performed better than the rest.

RMSE Summary
RMSE Summary

Out of the three models, the better performing model was Holt’s Exponential Smoothing method which obtained least RMSE.

In this article, we saw about time series and its components, fetching data using API call and packages, checking stationarity of the time series, detrending the time series, different types to model the time series, and finally its evaluation.

You can find the notebook for your reference here.


Related Articles