The world’s leading publication for data science, AI, and ML professionals.

Linear Boosting with Automated Features Engineering

Building Linear Models learning not only Linear Relationships

Photo by Frankie Lopez on Unsplash
Photo by Frankie Lopez on Unsplash

Feature Engineering is a very fascinating activity of every machine learning pipeline. Compared to other tasks, like feature selection and parameter tuning, feature engineering requires simple domain knowledge and genuine creativity. There are no fixed rules when creating new predictors. We only have to take care to not introduce leakage, i.e. incorporate future behaviors or not accessible target information, to not distort the outcomes.

The best features are often generated by hand after data exploration and followed by business understanding. These peculiarities make the feature engineering process difficult to fully automate. In the AI industry, where the actual trend consists to automate the building of machine learning pipelines, automated feature engineering risks being counterproductive. Automation may produce redundancies generating default features at the first stages which are not useful when modeling. This may result in increasing the pipeline execution time and suboptimal performances. The best would be to produce, in an automated way, only the features which are useful for the model and help to provide the best outcomes.

In this post, we carry out a forecasting task with a simple linear model build at the top of an automated feature engineering process. The whole procedure is based on this work, which introduces a simple boosting algorithm based on recursive feature generation from linear model residuals. We refer to this algorithm as linear boosting due to its iterative nature and the usage of linear models as base learners. The implementation is accessible in form of sklearn base estimator in the linear-tree package. linear-tree is a python library that provides estimators that adapt linear models to learn more complex data relations instead of simple linear ones.

THE DATA

We collect some data from Kaggle. The dataset contains five years of daily website visitors of various kinds: page loads, unique visits, first visits, returning visits.

Daily website visitors (image by the author)
Daily website visitors (image by the author)

No other predictors are available, except ones we can build using the timestamp information. We have to forecast future visiting KPIs not using an autoregressive approach. The time series at our disposal have complex seasonality patterns. Starting from a daily basis:

Daily website visitors patterns (image by the author)
Daily website visitors patterns (image by the author)

Long-term sophisticated patterns are confirmed plotting autocorrelations.

Autocorrelation of targets (image by the author)
Autocorrelation of targets (image by the author)

MODELING

Linear boosting is a two-stage learning algorithm. It shares similarities with the well-known tree-based gradient boosting due to its boosting approach. Instead of improving the model predictions repeatedly modeling the residuals, the residuals are useful to build new features that are used in the next iterations.

More in detail, as the first step a linear model is fitted on the whole data at disposal. The next step consists in fitting a simple decision tree to model the residuals of the previous stage using the same features. The tree identifies the path leading to the highest error (i.e. the worst leaf). The leaf contributing to the error the most is used to generate a new binary feature to be used in the first step. The iterations continue until a certain stopping criterion is met, i.e. a predefined number of iterations are computed.

A visual representation of proposed linear boosting algorithm (source: https://arxiv.org/pdf/2009.09110.pdf)
A visual representation of proposed linear boosting algorithm (source: https://arxiv.org/pdf/2009.09110.pdf)

For our forecasting task, we benchmark different approaches. We compare the performance of a simple Linear Regression, a linear boosting, and a pipeline made by linear boosting, only for feature extraction, and a k-nearest neighbors regressor. All the analyzed modeling strategies are properly tuned using cross-validation on a training set.

Comparing test predictions (image by the author)
Comparing test predictions (image by the author)
Comparing test predictions (image by the author)
Comparing test predictions (image by the author)
Comparing test predictions (image by the author)
Comparing test predictions (image by the author)
Comparing test predictions (image by the author)
Comparing test predictions (image by the author)

The combination of linear boosting and the k-nearest neighbors seems to be beneficial. A simple linear model is not suited to take into consideration the complex seasonal patterns in the data. Linear boosting alone does a great job but the best comes when we use linear boosting for feature generation together with an external predictive model. This is a demonstration of the learning ability of this kind of boosting procedure to learn relations more complex than the simple linear ones.

Performance improvements as percentage deviations from linear regression test error. The lowest the best (image by the author)
Performance improvements as percentage deviations from linear regression test error. The lowest the best (image by the author)

SUMMARY

In this post, we carried out a forecasting task making use of an automated way to create features. The proposed procedure grants the possibility to automate feature engineering leveraging the best from the chosen predictive model. It is also able to learn complex data patterns, beyond simple linear relations, without particular constraints. It provides great adaptability in every pipeline also using final forecasting models different from linear ones.


CHECK MY GITHUB REPO

Keep in touch: Linkedin


Related Articles