The world’s leading publication for data science, AI, and ML professionals.

PyCaret Time Series Module Architecture Overview

Looking under the hood

Looking under the hood of PyCaret Time Series Module - Photo by Alison Ivansek on Unsplash
Looking under the hood of PyCaret Time Series Module – Photo by Alison Ivansek on Unsplash

📚 Overview

Understanding the underlying architecture of any software package goes a long way in making sure we can use it to the best possible extent. It does not mean that one must be aware of every line of code in it, but sometimes, just having an overview can help.

This article aims to provide an architectural overview of the pycaret time series module and shows examples where this information might come in handy while evaluating the model developed using pycaret.


📖 Suggested Previous Reads

If you have not already done so, I would recommend the following short read. It talks about how pycaret uses regression-based forecasting models (something that we will talk about later in this article)

👉 Reduced Regression Models for Time Series Forecasting


📗 Architecture

The pycaret time series module is built on top of sktime which is a unified framework for time series analysis. sktime aims to do for time series analysis what sklearn did for Machine Learning. You can read more about it here if you wish, but it is not required for this article as I will give a quick overview.

sktime provides a framework to:

  1. Create time series models with sklearn regressors using the reduced regression technique (see suggested previous read).
  2. Create models pipelines with transformations akin to what sklearn provides.
  3. Connect to other time series packages (such as [statsmodels](https://github.com/alan-turing-institute/sktime/blob/v0.8.1/sktime/forecasting/base/adapters/_statsmodels.py#L17), [pmdarima](https://github.com/alan-turing-institute/sktime/blob/v0.8.1/sktime/forecasting/base/adapters/_pmdarima.py#L14), [tbats](https://github.com/alan-turing-institute/sktime/blob/4e06cb0231cdabb74bf88d0cb4f2b721fc863fe3/sktime/forecasting/base/adapters/_tbats.py#L18), [prophet](https://github.com/alan-turing-institute/sktime/blob/v0.8.1/sktime/forecasting/base/adapters/_fbprophet.py#L19), etc) using adapters.
  4. Allow users to create their own forecasting models using extension templates.
PyCaret Time Series Module: Architecture Overview [Image by Author]
PyCaret Time Series Module: Architecture Overview [Image by Author]

While a user can use the sktime library directly to create models, managing the workflow and model comparison process still needs to be handled manually (similar to what you would do if building models in sklearn directly). Thankfully, pycaret provides a convenient way to do this in a few lines of code by wrapping these models, pipelines, and adapters in convenient framework as shown below.

#### Create different types of models ----

# ARIMA model from `pmdarima`
arima_model = exp.create_model("arima")

# ETS and Exponential Smoothing models from `statsmodels`
ets_model = exp.create_model("ets")
exp_smooth_model = exp.create_model("exp_smooth")

# Reduced Regression model using `sklearn` Linear Regression
lr_model = exp.create_model("lr_cds_dt")

So when you create a time series model in pycaret, you get back one of these sktime adapters, pipelines or a sktime compatible model that you developed yourself.

#### Check model types ----
print(type(arima_model))      # sktime `pmdarima` adapter 
print(type(ets_model))        # sktime `statsmodels` adapter
print(type(exp_smooth_model)) # sktime `statsmodels` adapter
print(type(lr_model))         # Your custom sktime compatible model
[Image by Author]
[Image by Author]

But there is so much more information that one can extract from these models than meets the eye. For example, if the model that you created using pycaret is called model, the underlying wrapped library model, sktime pipeline or your custom sktime compatible model can be extracted with ease by calling model._forecaster.

Model Objects & Available Methods [Image by Author]
Model Objects & Available Methods [Image by Author]
#### Access internal models using `_forecaster` ----
print(type(arima_model._forecaster))
print(type(ets_model._forecaster))
print(type(exp_smooth_model._forecaster))
print(type(lr_model._forecaster))
[Image by Author]
[Image by Author]

From this point onward, you can extract valuable information about your model using either the native library methods/functions or sktime wrappers.

#### What hyperparameters were used to train the model? ----
print(arima_model)

#### Access statistical fit properties using underlying `pmdarima`
arima_model._forecaster.summary()

#### Alternately, use sktime's convenient wrapper to do so ---- 
arima_model.summary()
ARIMA Model Parameters [Image by Author]
ARIMA Model Parameters [Image by Author]
ARIMA Model Statistical Summary [Image by Author]
ARIMA Model Statistical Summary [Image by Author]

For example, the above image shows us that the ARIMA model was built with the requirement of an intercept. The fit returned an intercept value of 5.798. We will discuss these statistical details in another post (see Suggested Next Reads), but for now, just know that this information can be readily accessed.

Similarly, we can extract information about pipelines using methods that are similar to how one would do this in sklearn.

#### sktime pipelines are similar to sklearn.
#### Access steps using `named_steps` attribute
print(lr_model._forecaster.named_steps.keys())
Model Pipeline Steps [Image by Author]
Model Pipeline Steps [Image by Author]

So this model is actually a pipeline with 3 steps – a conditional deseasonalizer followed by a detrender, followed by the actual forecaster. You can get more details about these steps by just called named_steps. For example, we can see that the forecaster is actually a regression based model using sklearn LinearRegression. This is what we asked for when we built the lr_cds_dt model (lr stands for Linear Regression, cds stands for Conditional Deseasonalizer and dt stand for Detrender).

#### Details about the steps ----
pprint(lr_model._forecaster.named_steps)
Model Pipeline Details [Image by Author]
Model Pipeline Details [Image by Author]

🚀 Conclusion

While pycaret provides a low code environment to create and manage modeling workflows, there is a lot more that can be done if we look under the hood. This article just scratches the surface of the possibilities. In future articles, we will look at how we can use pycaret to understand the working of the underlying models such as ARIMA. Until then, if you would like to connect with me on my social channels (I post about Time Series Analysis frequently), you can find me below. That’s it for now. Happy forecasting!

🔗 LinkedIn

🐦 Twitter

📘 GitHub

Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, at no extra cost to you.

Join Medium with my referral link – Nikhil Gupta


📗 Resources

  • Jupyter Notebook (can be opened in Google Colab) containing the code for this article

📖 Suggested Next Reads

Understanding ARIMA Models using PyCaret’s Time Series Module – Part 1

Adding Custom Time Series Models to PyCaret


Related Articles