Time Series Analysis and Forecasting of Web Service Metrics

Discuss various Machine Learning techniques for Analysis and Forecasting of Web Service metrics and its applications.

Vaibhav Sethi
Towards Data Science

--

Overview

In this article, we’re going to discuss various machine learning techniques for analysis and forecasting of web service metrics and its applications. Auto-scaling is a good application of this, where forecasting techniques can be applied to estimate the request rates for a web service. Similarly, forecasting techniques can be applied to service metrics to predict alerts and anomalies.

In this article, I’ll first talk about Time Series data and its role in forecasting techniques. Later on, I’ll illustrate a predictive model for forecasting request rate of a web service. This article provides basic understanding of time series forecasting techniques, which can be applied to service metrics or any time series data.

Getting Started…

What is Time series?

Time series is a collection of data points collected at constant time intervals. Examples of time series data can vary from application metrics like request rates in RPM to system metrics like Idle CPU % taken at constant time intervals.

In order to solve prediction problem with machine learning techniques, the data needs to be converted to time series format. Time series data have a natural temporal ordering. Time series analysis can be applied to real-valued, continuous data, discrete numeric data, or discrete symbolic data.

For the illustration I’ve used Python. Numpy, Pandas, Matpoltlib modules would be used for transformation & analysis, and Statsmodel would be used for the predictive model.

Statistics of Request Rate data
Few data points of time series
Time series graph representing request rates (in RPM) for a web service taken at every minute of interval.

Preprocessing Data

Look at the graph for Request Rate in RPM above, do you see any challenges with the data?

In the graph above you can observe that there are too many data points and spikes to work with.

How to handle this?

* Resampling : All the data points were taken at a minute interval. Resampling the data to hourly or daily or weekly can help reduce the number of data points to work with. In this example we will use average daily resampled values.

* Transformation : Transformation like logarithm, square root or cube root can be applied to handle the spikes in the graph. In this example we will perform log transformation for the time series.

Observe that large number of data points have reduced and the graph looks smoother as a result of daily resampling.
Observe the values on the y-axis as a result of log transformation

Time series Basic

Univariate Vs. Mutli-variate Timeseries
Univariate Time series data consists of only one variable. Univariate analysis is the simplest form of data analysis where the data being analysed contains only one variable. Since it’s a single variable it doesn’t deal with causes or relationships. An example of this univariate time series could be request rate metrics.
When time series consists of two variables it is called Bivariate time series. The analysis of this type of data deals with causes and relationships and the analysis is done to find out the relationship among the two variables. Example of this could be % CPU Usage of a web service which depends on the request rate. These variables are often plotted on X and Y axis on the graph for better understanding of data and one of these variables is independent while the other is dependent.
Multivariate time series consists of three or more variables. Example of multi-variate time series could be stock prices which would depend on multiple variables.

Components of Time series
In order to come up with a suitable model for forecasting time series, it is important to understand the components of the time series data. Time series data mainly consists of the following components:

  • Trend
    Trend shows the general tendency of the data to increase or decrease during a long period of time. A trend is a smooth, general, long-term, average tendency. It is not always necessary that the increase or decrease is in the same direction throughout the given period of time. The request rate for web service may show some kind of tendencies of movement over a long period of time.
  • Seasonality
    These are short term movements occurring in data due to seasonal factors. The short term is generally considered as a period in which changes occur in a time series. Eg. an e-commerce web service may receive more traffic in certain months.
  • Cycle
    These are long term oscillations occurring in a time series.
  • Error
    These are the random or irregular movements in a time series. These are sudden changes occurring in a time series which are unlikely to be repeated.

Additive Vs. Multiplicative Model

Simple decomposition model could be:
Additive model : Y[t] = T[t] + S[t] + e[t]
Multiplicative model : Y[t] = T[t] * S[t] * e[t]

Where, Y[t] is the predicted value at time ‘t’, T[t], S[t] and e[t] are the trend component, seasonal component and error at time ‘t’ respectively.

Additive Decomposition Model for time series
Multiplicative Decomposition Model for time series

Stationary Series

Time series is said to be stationary if it has constant statistical properties over time, ie. the following:

  • constant mean
  • constant variance
  • an auto covariance that does not depend on time.

Most of the time series models require the time series to be stationary.

How to Check Stationarity of a Time Series?

Here are some ways to check the stationarity of a time series:

  • Rolling Statistics
    We can plot the moving average or moving variance to check variation with time. Eg. Rolling average of requests per minute graph over period of 7 days. This is a visual technique.
  • Dickey-Fuller Test
    This is one of the statistical tests for checking stationarity. It is a type of unit root test. The test results comprise of a Test Statistic and some Critical Values for difference confidence levels. If the ‘Test Statistic’ is less than the ‘Critical Value’, we can reject the null hypothesis and say that the series is stationary. Here the null hypothesis is that the time series is non-stationary.
Rolling Stats Plot and AD Fuller Test results for original time series

How to make a Time series Stationary?

There are multiple ways of making a time series stationary. Some of them are differencing, detrending, transformation, etc.

Rolling Stats Plot and AD Fuller Test results for log transformed and differenced time series.

Models Fitting and Evaluation

I’ll broadly talk about two kinds of predictive models, mathematical models and artificial neural networks.

Predictive Models

Mathematical Models

Some of the classical time series forecasting models are mentioned below. I’ll illustrate the SARIMA model for our scenario.

Models like AR, MA, ARMA, ARIMA are trivial cases of SARIMA model. VAR, VARMA, VARMAX are similar to previously mentioned models, they are useful in case of vector data rather than univariate time series.

In certain cases Holt-Winter model may be used to predict time series with seasonality component present.

SARIMA Model

When trend and seasonality is present in a time series, very popular method is to use the Seasonal AutoRegressive Integrated Moving Average (SARIMA) model which is a generalisation of an ARMA model.

SARIMA model is denoted by SARIMA (p,d,q) (P,D,Q) [S], where

  • p, q refer to the autoregressive, and moving average terms for the ARMA model
  • d is the degree of differencing (the number of times the data have had past values subtracted)
  • P, D, and Q refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.
  • S refers to the number of periods in each season

Model Parameters Estimation

* For the SARIMA (p, d, q) (P, D, Q) [S] model, we need to estimate 7 parameters.

* From the seasonal decomposing we have seen that the time series data has seasonality. Hence, S = 365 and denotes a lag of 365 days in seasonal changes.

* For p, q, P & Q parameters we can plot ACF (Autocorrelation Function) & PACF (Partial Autocorrelation Function) and for parameters d & D we can try the same plots with differences in time series.

ACF Plot suggests a possibility of P = 0 and D = 0
ACF Plot suggests a possibility of p ~ 43 and d = 0
PACF Plot suggests a possibility of Q = 0 and D = 0
PACF Plot suggests a possibility of q ~ 7 and d = 0

* Other way to estimate the parameters is by trying multiple set of values to find a model for which the AIC (Akaike information criterion) value is relatively less.

* The estimated parameters for the model are: SARIMA (2, 1, 4) (0, 1, 0) [365]

Training and Test Dataset Split

Just like other machine learning models, in order to evaluate the accuracy of the model we split the dataset into training and test datasets. This ratio may vary from 60% to 90%. In our case due to less number of data points, I’m keeping the ratio of 95%. Another reason for keeping the ratio 95% is that for SARIMA model to predict accurately, the training dataset should have enough data points of 2 seasons.

Model will be trained with 2 years data and tested with 39 days data.

Model Fitting

The data being fit is the daily average resampled and log transformed time series. Estimated parameters (2, 1, 4) (0, 1, 0) [365] are called for the model. Observe that the AIC value is -296.90 for this model.

Forecasting

Now that the model is trained we can perform the forecasting. For this we can provide the number of steps to be predicted as the parameter.

Graph showing Training, Test and Predicted values for Average Daily and Log Transformed RPM values for the service.

Note : Observe the values on the y-axis in the original graph at the beginning of the article and the predicted graph mentioned above. The RPM values for the web service was around 300 and the predicted values are around 5, this is because the predicted values are transformed. We need to apply inverse transformation to get the value on the original scale. Can you guess which inverse transformation would be suitable based on the transformation we have applied?

Based on the log transformation applied, we need to apply exponential transformation for inverse. This is required before evaluating the forecast, since we need to know the accuracy of the predicted values on the original scale.

Validating Forecast

Now that we have inverse transformed the predicted values back to the original scale, we can evaluate the accuracy of the predictive model.

For this we need to find the errors between Original & Predicted test values to compute:
* Mean Squared Error (MSE)
* Root Mean Squared Error (RMSE)
* Coefficient of Variation
* Quartile Coefficient of Dispersion, etc

Since our data set had only 39 data points, we will be able to evaluate the prediction for 39 days. Let’s evaluate the model for 39 days and 20 days to compare the results.

39 Days Prediction

Actual and Predicted RPM for 39 days

Note : Coefficient of variation is 11.645 which means that the model was able to predict the Daily Average RPM for a service with an accuracy of 88% for next 39 days.

20 Days Prediction

Actual and Predicted RPM for 20 days

Note : Coefficient of variation is 5.55 which means that the model was able to predict the Daily Average RPM for a service with an accuracy of 94% for next 20 days.

Neural Networks

The Neural Network models for prediction based problems work differently than the mathematical models. A Recurrent Neural Network (RNN) is a class of Artificial Neural Network which exhibits temporal dynamic behaviour.

LSTM (Long Short-Term Memory) is one of the most suitable RNN. For developing LSTM model, the time series forecasting problem must be re-framed as supervised learning problem.

Conclusion

  • For predicting certain metrics like response time for web service, very efficient models would be required to forecast real-time time series data.
  • Each metric has a suitable granularity of time step associated with it. For predicting the response times the suitable granularity of time step would be seconds or minutes. While for auto-scaling, predicting Daily RPM could be sufficient.
  • Appropriate model should be chosen while predicting complex metrics involving multi-variate time series like CPU Usage % or Disk Swaps as it might not be a univariate time series and multiple variables must be influencing its value.
  • A Predictive model developed for predicting one metric might not be suitable for another metric.
  • Accuracy of the models can be increased by training it with larger sets of data and fine tuning the parameters of the model.

Further Reading

--

--