When building a machine learning model, having more features is more important than having fewer. But using only the ones you need is more useful than just using them all. So after the first modeling trials, we need a semi-automatic process that can filter only the useful features for our supervised task.
We are simply referring to the Feature Selection process. It includes all the algorithms that select only a subset of variables, from the original dataset, to use as inputs in our predictive model. There are various techniques for carrying out feature selection and each method has various pros and cons to consider.
A low number of features is effective in producing simple and maintainable pipelines; improving generalization; lowering possible storage space or costs; reducing the inference time (whatever the final predictor is) or also providing better interpretable outcomes.
At the same time, producing only a subset of useful predictive features may be expensive, difficult, and ambiguous. A good feature selection algorithm should try all possible feature combinations and annotate which one results in performance improvement according to our validation strategy. Due to time constraints, data scientists are discouraged to adopt feature selection approaches with huge datasets. Additionally, selected features are dependent on the model and parameters used. So simply changing one of them may alter the final results in a bad way. Doubts may also raise about where to place the feature selection in our Machine Learning workflow to get the best.
To fight these challenges, we released shap-hypetune: a python package for simultaneous hyperparameters tuning and features selection. It aims to optimize, in a single pipeline, the optimal number of features while searching for the best parameters configuration of gradient boosting models. It provides various parameter searching methods (grid, random, or bayesian search) and feature selection strategies (Recursive Feature Elimination/Addition and Boruta) also using SHAP values to improve generalization.
In this post, we focus on the beauty of recursive feature selection algorithms. Recursive feature selection enables the search of a reliable subset of features while looking at performance improvements and maintaining the computation costs acceptable. So it has all the prerequisites to be one of the most effective feature filtering methods in real-world applications. We aim to explore the less known additive approach (Recursive Feature Addition), comparing it with the most famous subtractive approach (Recursive Feature Elimination) in a standard classification problem.
Recursive Feature Addition
Recursive Feature Addition chooses features following a recursive addition procedure. The workflow is schematized below:
- Fit an estimator (a gradient boosting in our case) using all the available features.
- Extract the feature importances ranking (standard tree-based or Shap importance are valid).
- Order the features according to their contribution.
- Fit the estimator using only the most relevant features and compute performance on the validation data.
- Include the next most important feature and fit a new estimator.
- Calculate the performance difference between models in steps 5 and 6.
- If the performance improves, the feature is considered a valuable predictor.
- Iterate through steps from 5 to 7 until all features have been taken into account.
Carrying out recursive feature addition with shap-hypetune is straightforward.
rfa = BoostRFA(
LGBMClassifier(),
step=3, min_features_to_select=1
)
rfa.fit(X_train, y_train, eval_set=[(X_val, y_val)])
In the above example, we simply use RFA with an LGBMClassifier. The customizations are many. For example, we can use a BoostRFA instance with SHAP feature importance (instead of the classical tree-based one) or while searching for the optimal parameters configurations.
For our classification task, we use BoostRFA, together with LGBMClassifier, computing a Bayesian parameter searching. The results are reported below.

Recursive Feature Elimination
Recursive Feature Elimination chooses features following a recursive elimination procedure. The workflow is schematized below:
- Fit an estimator (a gradient boosting in our case) using all the available features.
- Extract the feature importances ranking (standard tree-based or SHAP importance are valid).
- Order the features according to their contribution.
- Exclude the least important feature and fit a new estimator.
- Calculate the performance difference between models in step 4 among consecutive iterations.
- If the performance improves, the feature is discharged.
- Iterate through steps from 4 to 7 until all features have been taken into account.
As for the RFA case, carrying out recursive feature addition with shap-hypetune is straightforward.
rfe = BoostRFE(
LGBMClassifier(),
step=3, min_features_to_select=1
)
rfe.fit(X_train, y_train, eval_set=[(X_val, y_val)])
In the above example, we simply use RFE with an LGBMClassifier. The customizations are many. For example, we can use a BoostRFE instance with SHAP feature importance (instead of the classical tree-based one) or while searching for the optimal parameters configurations.
For our classification task, we use BoostRFE, together with LGBMClassifier, computing a Bayesian parameter searching. The results are reported below.

Conclusions
Both RFA and RFE show amazing results in our simulated classification task. They also demonstrate a good filtering ability. Looking at the results of cross-validation, the informative features are the most selected, followed by the linear combination features (redundant) and the noisy ones.

In conclusion, we proved the goodness of recursive algorithms for feature selection. We introduced the less known recursive feature addition, comparing it with the most popular recursive feature elimination method. Both procedures showed satisfactory results. However, the perfect selective method doesn’t exist. We have to find and optimize the correct filtering strategy according to our task.
If you are interested in the topic, I suggest:
- SHAP for Feature Selection and HyperParameter Tuning
- Boruta and SHAP for better Feature Selection
- Boruta SHAP for Temporal Feature Selection
- SHAP for Drift Detection: Effective Data Shift Monitoring
Keep in touch: Linkedin