The world’s leading publication for data science, AI, and ML professionals.

LightGBM for Quantile Regression

Understand Quantile Regression

For regression prediction tasks, not all time that we pursue only an absolute accurate prediction, and in fact, our prediction is always inaccurate, so instead of looking for an absolute precision, some times a prediction interval is required, in which cases we need quantile regression – that we predict an interval estimation of our target.

Loss Function

Fortunately, the powerful Lightgbm has made quantile prediction possible and the major difference of quantile regression against general regression lies in the loss function, which is called pinball loss or quantile loss. There is a good explanation of pinball loss here, it has the formula:

Where y is the actual value, z is the prediction and 𝛕 is the targeted quantile. So the first sight of the loss function, we can see that besides when quantile equals to 0.5, the loss function is unsymmetrical. Let’s have a visual on it:

The implementation can be found on my Git Repo. In the graph, three different quantiles were plotted, take quantile 0.8 as an example, when the error is positive ( z > y – predicted value is higher than the actual value), the loss is less than that when error is negative. In another world, higher error is less punished, this makes sense in that for high quantile prediction, the loss function encourages higher prediction value, and vice versa for low quantile prediction.

Generate Sample Dataset

Now let’s generate some data for lightGBM prediction.

Here we use a sin(x) function with some additional noise as training set.

LightGBM Prediction

Initiate LGMRegressor :

Notice that different from general Regression, the objective and metric are both quantile , and alpha is the quantile we need to predict ( details can check my Repo ).

Prediction Visualisation

Now let’s check out quantile prediction result:

We can see that most noisy dots are located in the prediction range, where the green line is the upper bound of 0.9 quantile and blue is the 0.1 quantile.

This post is originally inspired by this, which is a great entry point quantile regression starter.


Related Articles