The world’s leading publication for data science, AI, and ML professionals.

SHAP for Feature Selection and HyperParameter Tuning

Use SHAP for optimal Feature Selection while Tuning Parameters

Photo by Lanju Fotografie on Unsplash
Photo by Lanju Fotografie on Unsplash

Feature Selection and hyperparameter tuning are two important steps in every machine learning task. Most of the time they help to improve the performances but with a drawback to be time expensive. The more the parameter combinations, or the more accurate the selection process, the higher the duration. This is a physical limit that actually we can’t defeat. What we can do is leverage the best from our pipeline. We face different possibilities, two of the most convenient are:

  • Combine the tuning and the selection of features;
  • adopt SHAP (SHapley Additive exPlanations) to make the whole procedure more generalizable and accurate.

Combining the tuning process with the optimal choice of features may be a need of every ranking-based selection algorithm. A ranking selection consists of iteratively dropping the less important features while retraining the model until convergence is reached. The model used for feature selection may differ (in parameter configuration or in the type) from the one used for final fitting and prediction. This may result in suboptimal performances. This is the case for example of RFE (Recursive Feature Elimination) or Boruta, where the features, selected through variable importance by an algorithm, are used by another algorithm for the final fit.

SHAP helps when we perform feature selection with ranking-based algorithms. Instead of using the default variable importance, generated by gradient boosting, we select the best features like the ones with the highest shapley values. The benefit of using SHAP is clear due to the bias present in native tree-based feature importance. The standard methods tend to overestimate the importance of continuous or high-cardinality categorical variables. This makes not trustable the importance of computation in case of feature shifts or changes in the number of categories.

To overcome these lacks, we developed shap-hypetune: a python package for simultaneous hyperparameters tuning and features selection. It allows combining hyperparameters tuning and features selection in a single pipeline with gradient boosting models. It supports grid-search, random-search, or bayesian-search and provides ranking feature selection algorithms like Recursive Feature Elimination (RFE), Recursive Feature Addition (RFA), or Boruta. The additional boost consists of offering the possibility to use SHAP importance for feature selection.

In this post, we demonstrate the utility to adopt shap-hypetune when carrying out a supervised predictive task. We try to search for the optimal parameter configuration while selecting the best feature set with (and without) SHAP. Our experiment is partitioned into three trials. Given a dataset in a classification scenario, firstly we fit a LightGBM simply by optimizing the parameter. Then we try to operate a standard RFE with default tree-based feature importance while optimizing parameters. Lastly, we do the same but selecting the features with SHAP. To make things spicier we use an unbalanced binary target and some categorical features with high cardinality.

Parameter Tuning

In this first section, we compute a fit on our train set searching only for the best parameter combination. The best model reaches a precision higher than 0.9 but with a low recall on our test data.

Performance on test data (image by the author)
Performance on test data (image by the author)

Let’s see if can do better.

Parameter Tuning + Feature Selection

Generally, feature selection is introduced to remove noisy predictors from the original set of data. We use Recursive Feature Elimination (RFE) while searching for the optimal set of parameters. In other words, for each parameter configuration, we iterate RFE on the initial training data. The procure can be speeded up by configuring proper fitting parameters, like early stopping, or setting larger steps while deleting the worse features. The pipeline with the best score on the validation set is stored and ready to use at inference time.

Performance on test data (image by the author)
Performance on test data (image by the author)

In this situation, we register an overall improvement but maintain low values for recall and F1 score.

Parameter Tuning + Feature Selection with SHAP

In the end, we repropose the same procedure as before but using SHAP for RFE. SHAP is extremely efficient when used in conjunction with tree-based models. It uses a tree-path approach to follow the trees and extract the number of training examples that go down each leaf to provide the background computations. It is also less prone to be overconfident due to fact that we can calculate importance on the validation set and not on the training data (like classical tree-based importances).

Performance on test data (image by the author)
Performance on test data (image by the author)

We register a great improvement in recall and F1 score. SHAP is able to discharge the low-quality categorical features preserving only the best predictors.

Performance comparison on test data (image by the author)
Performance comparison on test data (image by the author)

SUMMARY

In this post, we introduced shap-hypetune, **** as a helpful framework to carry out Parameter Tuning and optimal features searching for gradient boosting models. We showed an application where we used grid-search and Recursive Feature Elimination but random-search and Boruta are other available options. We saw also how to improve the selection process using the SHAP power in a case of a lack of performance from classical feature importance methods.

If you are interested in the topic, I suggest:


CHECK MY GITHUB REPO

Keep in touch: Linkedin


Related Articles