The world’s leading publication for data science, AI, and ML professionals.

A light passage for LightGBM

As a non-IT background then decided to pursue data science as a career I realize there are so many knowledge I need to expand and skills I…

As a non-IT background who decided to pursue data science as a career, I realize there is so much knowledge I need to expand and skills I need to leverage, including writing skills – explaining concepts in a simple and understandable way regardless of the audience background. Therefore, I decided to start writing on Medium.

To start off, this passage will explain LightGBM. LightGBM stands for light Gradient Boosting Machine, let’s try to break down the concept by 5W+1H.

What is Light Gradient Boosting Machine?

LightGBM is a gradient boosting framework that uses tree based learning algorithm. In my opinion, tree based algorithm is the most intuitive algorithm because it mimics on how human make a decision.

It is 11 am and you confuse if you need to eat now or later, so you make a decision tree. A decision tree consist of the root node as the root cause, branch node as the decision node and leaf node as the decision result. Image by the author.
It is 11 am and you confuse if you need to eat now or later, so you make a decision tree. A decision tree consist of the root node as the root cause, branch node as the decision node and leaf node as the decision result. Image by the author.

Before answering the question, first, we need to know what is boosting and the gradient boosting framework.

Boosting is an ensemble technique for creating collections of predictors or the method to combine weak learner into strong learner to predict the output. The weak learner here is each one of the decision tree. It is weak because it performs poorly in predicting or classifying. To get better prediction, we combine the weak learner where each learner will produce a hypothesis and combine it together will create a final hypothesis in predicting the output. Boosting works in a way that the trees are grown sequentially: each tree is grown using information from previously grown trees.

Since we want to add many weak learners to our model, we might ask how can we know if our model is optimized? Here we use gradient boosting where we apply Gradient Descent procedure to find the optimum model in which the loss function is minimized.

It means to understand gradient boosting we have to understand gradient descent, loss function and optimization function.

A simple gif to ilustrate the gradient descent in which we want to find the intecept of a linear regression with the loss function RMSE. The gradient of the green line descent until it finds the least amount of RMSE where the gradient is close to 0. The gif is made by the author.
A simple gif to ilustrate the gradient descent in which we want to find the intecept of a linear regression with the loss function RMSE. The gradient of the green line descent until it finds the least amount of RMSE where the gradient is close to 0. The gif is made by the author.

Optimization function is a function that we applied to reach the objective that we want, in this case minimizing the loss function. The loss function will measure how far off is the model from the actual data. If the result of the model or the prediction is way off, the loss function will result in a large number. The optimization function will gradually reduce the loss function/the error until it converges to a minimum value. The loss function that we usually encounter are Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) for regression problem and Binary Loss Function and Cross Entropy Loss for classification problem. Gradient descent mean the gradient will gradually decent as the loss function become minimzed until the gradient reach limit to 0.

Back to Lightgbm, using the tree based learning algorithm the weak learner will grow sequentially meaning that the first tree that we built will learn how to fit to the target variable, then the second tree will learn from the first tree and also learn how to fit to the residual, the next tree will learn to reduce the residual and fit the residual from the previos tree and it continues until the residual doesn’t change. The gradient of the errors are propagated throughout the system and it is called level-wise tree growth.

Taken from LightGBM documentation illustrating leaf-wise growth tree.
Taken from LightGBM documentation illustrating leaf-wise growth tree.

What makes the LightGBM different from another gradient boosting algorithm is in XGBoost the growth of the tree is level-wise while CatBoost is suited more for categorical variables.

Who build LightGBM?

In 2017 microsoft build LightGBM as the alternative the use of XGBoost. lightGBM could be used in Python, R and C++.

Why do we need to use LightGBM?

As stated on the documentation, LightGBM is the improvement of gradient boosting algorithm in terms of efficiency, speed, and supporting the distributed parallel processing and GPU.

LightGBM is suitable to be used if you want to build a model with abundant amount of data. If you only have 100 data it is better to use other machine learning algorithm because your model might cause over-fitting.

How to use LightGBM?

In short there are three steps I applied when I am using lightGBM:

  1. Prepare the training and testing data (data preprocessing, exploratory data analysis, and data encoding for categorical variables)
  2. Choose Optimization Function to get the tuning parameter. You can choose grid-search, random-search, bayesian-optimization, and etc. Several important tuning parameters are:
  • learning_rate : the step size for each iteration while moving toward a minimum of a loss function in gradient descent
  • max_depth : the maximum depth of the tree, handling overfitting by lowering the tree’s depth
  • min_data_in_lead : the minimum number of records a leaf may have
  • feature_fraction : fractions of features/paramters that will be randomly selected in each iteration for building trees
  • bagging_fraction : specifies the fraction number of data to be used in each iteration to create a new dataset
  • lambda : parameter for regulatization to address over-fitting and feature selection, l1-norm used for lasso regression and l2-norm used for ridge regression
  • min_split_gain : minimum gain to make the split in tree
  1. Train the model, fit the model and evaluate the model

If you are interested in understanding the relationship between variables and the target variables you can use the feature_importance. Feature importance will show you which variable play major role in predicting/classifying.

LightGBM is a popular boosting algorithm that is widely used in data science. It could handle categorical data, fast in performance, and grow the tree by leaf-wise sequentially.

To close this off,

Where can I find the complete documentation, place to learn and project examples?

for complete documentation : LightGBM

great tutorial and reading that I came across :

Introduction to Statistical Learning a great book to learn and it is free!

A great reading explaining the performance of XGBoost, LightGBM and CatBoost

An example of applying LightGBM in kaggle

More about gradient boosting algorithm

For the visual and audio learner, a great youtube channel by Prof. Alexander Ihler

Like LightGBM, my writing grows sequentially.

Feel free to drop me comments if you find anything that can be improved from this piece. Cheers!


Related Articles