📚 Overview
Understanding the underlying architecture of any software package goes a long way in making sure we can use it to the best possible extent. It does not mean that one must be aware of every line of code in it, but sometimes, just having an overview can help.
This article aims to provide an architectural overview of the pycaret
time series module and shows examples where this information might come in handy while evaluating the model developed using pycaret
.
📖 Suggested Previous Reads
If you have not already done so, I would recommend the following short read. It talks about how pycaret
uses regression-based forecasting models (something that we will talk about later in this article)
👉 Reduced Regression Models for Time Series Forecasting
📗 Architecture
The pycaret
time series module is built on top of sktime
which is a unified framework for time series analysis. sktime
aims to do for time series analysis what sklearn
did for Machine Learning. You can read more about it here if you wish, but it is not required for this article as I will give a quick overview.
sktime
provides a framework to:
- Create time series models with
sklearn
regressors using the reduced regression technique (see suggested previous read). - Create models pipelines with transformations akin to what
sklearn
provides. - Connect to other time series packages (such as
[statsmodels](https://github.com/alan-turing-institute/sktime/blob/v0.8.1/sktime/forecasting/base/adapters/_statsmodels.py#L17)
,[pmdarima](https://github.com/alan-turing-institute/sktime/blob/v0.8.1/sktime/forecasting/base/adapters/_pmdarima.py#L14)
,[tbats](https://github.com/alan-turing-institute/sktime/blob/4e06cb0231cdabb74bf88d0cb4f2b721fc863fe3/sktime/forecasting/base/adapters/_tbats.py#L18)
,[prophet](https://github.com/alan-turing-institute/sktime/blob/v0.8.1/sktime/forecasting/base/adapters/_fbprophet.py#L19)
, etc) using adapters. - Allow users to create their own forecasting models using extension templates.
![PyCaret Time Series Module: Architecture Overview [Image by Author]](https://towardsdatascience.com/wp-content/uploads/2021/11/1hpJzvk2Yvupk7MVrO8aiXg.png)
While a user can use the sktime
library directly to create models, managing the workflow and model comparison process still needs to be handled manually (similar to what you would do if building models in sklearn
directly). Thankfully, pycaret
provides a convenient way to do this in a few lines of code by wrapping these models, pipelines, and adapters in convenient framework as shown below.
#### Create different types of models ----
# ARIMA model from `pmdarima`
arima_model = exp.create_model("arima")
# ETS and Exponential Smoothing models from `statsmodels`
ets_model = exp.create_model("ets")
exp_smooth_model = exp.create_model("exp_smooth")
# Reduced Regression model using `sklearn` Linear Regression
lr_model = exp.create_model("lr_cds_dt")
So when you create a time series model in pycaret
, you get back one of these sktime
adapters, pipelines or a sktime
compatible model that you developed yourself.
#### Check model types ----
print(type(arima_model)) # sktime `pmdarima` adapter
print(type(ets_model)) # sktime `statsmodels` adapter
print(type(exp_smooth_model)) # sktime `statsmodels` adapter
print(type(lr_model)) # Your custom sktime compatible model
![[Image by Author]](https://towardsdatascience.com/wp-content/uploads/2021/11/1t-vxfIHOvXJCPJ5eRs84Ng.png)
But there is so much more information that one can extract from these models than meets the eye. For example, if the model that you created using pycaret
is called model
, the underlying wrapped library model, sktime
pipeline or your custom sktime
compatible model can be extracted with ease by calling model._forecaster
.
![Model Objects & Available Methods [Image by Author]](https://towardsdatascience.com/wp-content/uploads/2021/11/17pB1V9heDn2dT1xVFUfCzg.png)
#### Access internal models using `_forecaster` ----
print(type(arima_model._forecaster))
print(type(ets_model._forecaster))
print(type(exp_smooth_model._forecaster))
print(type(lr_model._forecaster))
![[Image by Author]](https://towardsdatascience.com/wp-content/uploads/2021/11/1Czgh8yy6F9wTEUhGFjDq9Q.png)
From this point onward, you can extract valuable information about your model using either the native library methods/functions or sktime
wrappers.
#### What hyperparameters were used to train the model? ----
print(arima_model)
#### Access statistical fit properties using underlying `pmdarima`
arima_model._forecaster.summary()
#### Alternately, use sktime's convenient wrapper to do so ----
arima_model.summary()
![ARIMA Model Parameters [Image by Author]](https://towardsdatascience.com/wp-content/uploads/2021/11/1eDkT1GetWSS9yrGPLVToMg.png)
![ARIMA Model Statistical Summary [Image by Author]](https://towardsdatascience.com/wp-content/uploads/2021/11/1gOfxXjSya4wwqhG8vnPrTA.png)
For example, the above image shows us that the ARIMA model was built with the requirement of an intercept. The fit returned an intercept value of 5.798. We will discuss these statistical details in another post (see Suggested Next Reads), but for now, just know that this information can be readily accessed.
Similarly, we can extract information about pipelines using methods that are similar to how one would do this in sklearn
.
#### sktime pipelines are similar to sklearn.
#### Access steps using `named_steps` attribute
print(lr_model._forecaster.named_steps.keys())
![Model Pipeline Steps [Image by Author]](https://towardsdatascience.com/wp-content/uploads/2021/11/10g1XVVU1SOykA8PKYC-Izg.png)
So this model is actually a pipeline with 3 steps – a conditional deseasonalizer followed by a detrender, followed by the actual forecaster. You can get more details about these steps by just called named_steps
. For example, we can see that the forecaster is actually a regression based model using sklearn
LinearRegression
. This is what we asked for when we built the lr_cds_dt
model (lr
stands for Linear Regression, cds
stands for Conditional Deseasonalizer and dt
stand for Detrender).
#### Details about the steps ----
pprint(lr_model._forecaster.named_steps)
![Model Pipeline Details [Image by Author]](https://towardsdatascience.com/wp-content/uploads/2021/11/1f7A99oVOmuiU8wCwH4hZqQ.png)
🚀 Conclusion
While pycaret
provides a low code environment to create and manage modeling workflows, there is a lot more that can be done if we look under the hood. This article just scratches the surface of the possibilities. In future articles, we will look at how we can use pycaret
to understand the working of the underlying models such as ARIMA. Until then, if you would like to connect with me on my social channels (I post about Time Series Analysis frequently), you can find me below. That’s it for now. Happy forecasting!
📘 GitHub
Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, at no extra cost to you.
📗 Resources
- Jupyter Notebook (can be opened in Google Colab) containing the code for this article
📖 Suggested Next Reads
Understanding ARIMA Models using PyCaret’s Time Series Module – Part 1