
Time series forecasting is a very fascinating task. However, build a machine-learning algorithm to predict future data is trickier than expected. The hardest thing to handle is the temporal dependency present in the data. By their nature, time-series data are subject to shifts. This may result in temporal drifts of various kinds which may become our algorithm inaccurate.
One of the best tips I recommend, when modeling a time series problem, is to stay simple. Most of the time the simpler solutions are the best ones in terms of accuracy and adaptability. They are also easier to maintain or embed and more persistent to possible data shifts. In this sense, the gold standard for time series modeling consists in the adoption of linear-based algorithms. They require few assumptions and simple data manipulations to produce satisfactory results.
In this post, we carry out a sales forecasting task. We provide future forecasts using the standard Linear Regression and an improved version of it. We are referring to linear trees. They belong to the family of model trees, like classical decision trees, but are different because they compute linear approximations (instead of constant ones) fitting simple linear models in the leaves. The training is computed evaluating the best partitions on the data fitting multiple linear models. The final model is a tree-based structure with linear models in the leaves.
A pythonic implementation of linear trees is available in linear-tree: a python library to build Model Trees with Linear Models at the leaves. The package is fully integrable with sklearn. It provides simple BaseEstimators that wrap every linear model present in sklearn.linear_model
to build an optimal linear tree.
THE DATA
For our experiment, we simulate some data that replicate store sales. We generate artificial sales history of 400 stores. Where store sales are influenced by many factors, including promotions, holidays, and seasonality. Our duty is to forecast the future daily sales for up to one year in advance.
We don’t use any past information when we provide our predictions. We engineer all our regressors to be accurate and accessible in any future time period. This enables us to provide long-time forecasts.
We have 400 stores with historical sales from different years with visible daily seasonality.

MODELING
We aim to predict sales up to a year ahead for all the stores at our disposal. To make this possible, we build a different model at the store level. We operate parameters tuning on the training set for each store independently. We end with 400 models trained ad-hoc. The procedure is computed fitting simple linear regressions and linear trees. In this way, we can verify the goodness of 400 independent fits by comparing the performances on the unseen test data for both model types.
Training a linear tree is simple as training a standard linear regression. When we train a linear tree, we are simply fitting multiple linear models on different data partitions. We can clearly understand that we benefit from the same advantages, i.e. simple preprocessing and good explicative power.
Inspecting the tree paths of the various fits, we can understand the decision taken by the algorithm. In the case of linear trees, we identify which features are responsible to split the original dataset and provide the benefit of further fits.

As we can see, different models are used for different months, weeks of the year, or also days of the week. This is not so surprising and is because a single linear regression can’t generalize well on the whole dataset provided. Instead of manually searching for the perfect data partitions, linear trees evaluate and choose the best ones looking directly at the data.

Predictions on test data from linear regressions look more stable around the means. Predictions made with linear trees suit well the data adapting better to the various seasonal patterns and reproducing the peeks. In the end, we compare the two model types in terms of performance. On the test data, we compute the Root Mean Squared Error (RMSE) for all the shops at our disposal. This enables us to verify how many times linear trees are better than linear regressions.

Linear trees seem to outperform classical linear regressions more than 9 times on 10. This is a great result for us, which means that there is an advantage in using linear trees in our scenario.
SUMMARY
In this post, we carried out a Time Series Forecasting task using linear models. We tested simple linear regressions and linear trees to predict synthetic future store sales. The results were interesting. Linear trees learned better representations of the data providing more accurate predictions most of the time. This is achieved simply by fitting multiple linear regressions in specified data partitions obtained following simple decision rules.
Keep in touch: Linkedin