The world’s leading publication for data science, AI, and ML professionals.

Data Science Shorts: Decomposition

Considerable gains are sometimes achieved by the simplest of methods

In the first of a series of short articles, I would like to outline a key insight regarding input data: breaking down our training features and extracting the more fundamental features that compose them allows us to substantially improve the predictive quality of our models.

To illustrate our point, let’s say we are attempting to generate a forecast for future values of an arbitrary time series. For the sake of simplicity, we will be using a linear model to provide the forecasts. The concept remains valid for more complex models as well.

Generating forecasts from linear models can be done by treating time-series training data as a collection of vectors within a feature space. If we view the scalar values of each vector as a function of time, we can fit a regression model to predict the value of that feature vector at any time in the future by extrapolating along the regression line. The future values of each feature can then be factored in to predict the target variable (i.e. the forecast target) using a second regression model, this time by interpolating the values along the regression line.

For example, we can predict the monthly revenue of a grocery store using the amount of fruit sold as the training data. We can augment the training data (lagged data, nonlinear transforms…) but these all stem from a single feature: how many fruits we sold. When training the second regression model, we converge to an equation representing revenue as a function of fruit sold:

And here is the insight – fruit sales can be further broken down. If our store sells, for example, oranges and melons – our total fruit sales will be the number of oranges sold, plus the number of melons sold.

Even a simple decomposition by fruit type will exhibit different seasonal patterns. Oranges sales peak in the winter while melons peak during the summer. But treating orange and melon sales data as one feature forces the model to assign them the same weight:

Decomposing our fruit sales allows the model to assign a different weight to each feature, thus improving the prediction quality:

Testing this on a sample dataset I use in my job (B2B marketing data – predicting new customers by the number of interactions with a sales team), we can compare a forecast by the total number of interactions to a forecast which breaks them down by the channel through which each interaction occurred (e.g. LinkedIn, direct calls…)

The results show a 40% improvement in forecasting quality (measured using mean-average scaled error), just by decomposing our data. Neat!

Thanks for reading, and stay tuned for the next articles in this series 🙂


Related Articles