
Feature selection is a method to reduce the variables by using certain criteria to select variables that are most useful to predict the target by our model.
Increasing the number of features would help the model to have a good prediction power, but only until a certain point. This is what we called a Curse of Dimensionality, where the model’s performance would increase with the higher number of features we used. Still, the performance will deteriorate when the feature number is past the peak. That is why we need to select only the features that are effectively capable of prediction.
Feature selection is similar to the dimensionality reduction technique, where the aim is to reduce the number of the features, but fundamentally they are different. The difference is that feature selection selects features to keep or remove from the dataset, whereas dimensionality reduction creates a projection of the data resulting in entirely new input features. If you want to know more about dimensionality reduction, you could check other articles I have below.
Feature Selection has many methods, but I only would show 5 feature selections present in the Scikit-Learn. I limit only the one that is present in the scikit-learn as it is the easiest yet most useful. Let’s get into it.
1. Variance Threshold Feature Selection
A feature with a higher variance means that the value within that feature varies or has a high cardinality. On the other hand, lower variance means the value within the feature is similar, and zero variance means you have a feature with the same value.
Intuitively, you want to have a varied feature as we don’t want our predictive model to be biased. That is why we could select the feature based on the variance we select previously.
A variance Threshold is a simple approach to eliminating features based on our expected variance within each feature. Although, there are some down-side to the Variance Threshold method. The Variance Threshold feature selection only sees the input features (X) without considering any information from the dependent variable (y). It is only useful for eliminating features for Unsupervised Modelling rather than Supervised Modelling.
Let’s try it with an example dataset.
import pandas as pd
import seaborn as sns
mpg = sns.load_dataset('mpg').select_dtypes('number')
mpg.head()

For this example, I only use numerical features for simplicity purposes. We need to transform all of these numerical features before we use the Variance Threshold Feature Selection as the variance is affected by the numerical scale.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
mpg = pd.DataFrame(scaler.fit_transform(mpg), columns = mpg.columns)
mpg.head()

With all the features on the same scale, let’s try to select only the features we want using the Variance Threshold method. Let’s say I limit my variance to one.
from sklearn.feature_selection import VarianceThreshold
selector = VarianceThreshold(1)
selector.fit(mpg)
mpg.columns[selector.get_support()]

It seems only the weight feature are selected based on our Variance Threshold we set.
As I said before, the Variance Threshold only useful when we consider the feature selection for Unsupervised Learning. What if we want the feature selection for Supervised Learning purposes? That is what we gonna talk about next. Most of the feature selections from the Scikit-Learn are useful for Supervised Learning, after all.
2. Univariate Feature Selection with SelectKBest
Univariate Feature Selection is a feature selection method based on the univariate statistical test, e,g: chi2, Pearson-correlation, and many more.
The premise with SelectKBest is combining the univariate statistical test with selecting the K-number of features based on the statistical result between the X and y.
Let’s use it with an example data we have before.
mpg = sns.load_dataset('mpg')
mpg = mpg.select_dtypes('number').dropna()
#Divide the features into Independent and Dependent Variable
X = mpg.drop('mpg' , axis =1)
y = mpg['mpg']
Because the univariate feature selection method is intended for Supervised Learning, we divide the features into independent and dependent variables. Next, we would select the features using SelectKBest based on the mutual info regression. Let’s say I only want the top two features.
from sklearn.feature_selection import SelectKBest, mutual_info_regression
#Select top 2 features based on mutual info regression
selector = SelectKBest(mutual_info_regression, k =2)
selector.fit(X, y)
X.columns[selector.get_support()]

Based on the mutual info regression, we only select the ‘displacement,’ and ‘weight’ features like these are the top features.
3. Recursive Feature Elimination (RFE)
Recursive Feature Elimination or RFE is a Feature Selection method utilizing a machine learning model to selecting the features by eliminating the least important feature after recursively training.
According to Scikit-Learn, RFE is a method to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features, and the importance of each feature is obtained either through a coef_
attribute or through a feature_importances_
attribute. Then, the least important features are pruned from the current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.
tl;dr RFE selects top k features based on the machine learning model that has coef_
attribute or feature_importances_
attribute from their model (Almost any model). RFE would eliminate the least important features then retrain the model until it only selects the K-features you want.
This method only works if the model has coef_
or features_importances_
attribute, if there are models out there having these attributes, you could apply RFE on Scikit-Learn.
Let’s use a dataset example. In this sample, I want to use the titanic dataset for the classification problem, where I want to predict who would survive.
#Load the dataset and only selecting the numerical features for example purposes
titanic = sns.load_dataset('titanic')[['survived', 'pclass', 'age', 'parch', 'sibsp', 'fare']].dropna()
X = titanic.drop('survived', axis = 1)
y = titanic['survived']
I want to see which features are the best to help me predict who would survive the titanic incident using RFE. Let’s use the LogisticRegression model to obtain the best features.
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
# #Selecting the Best important features according to Logistic Regression
rfe_selector = RFE(estimator=LogisticRegression(),n_features_to_select = 2, step = 1)
rfe_selector.fit(X, y)
X.columns[rfe_selector.get_support()]

By default, the number of features selected for RFE is the median of the total features, and the step (the number of features eliminated each iteration) is one. You could change it based on your knowledge or the metrics you used.
4. Feature Selection via SelectFromModel
Like the RFE, SelectFromModel from Scikit-Learn is based on a Machine Learning Model estimation for selecting the features. The differences are that SelectFromModel feature selection is based on the importance attribute (often is coef_
or feature_importances_
but it could be any callable) threshold. By default, the threshold is the mean.
Let’s use a dataset example to understand the concept better. I would use the same titanic data we have before.
from sklearn.feature_selection import SelectFromModel
# #Selecting the Best important features according to Logistic Regression using SelectFromModel
sfm_selector = SelectFromModel(estimator=LogisticRegression())
sfm_selector.fit(X, y)
X.columns[sfm_selector.get_support()]

Using SelectFromModel, we found out that only one feature passed the threshold: the ‘pclass’ feature.
Like RFE, you could use any Machine Learning model to select the feature, as long as it was callable to estimate the attribute importance. You could try it out with the Random Forest model or XGBoost.
5. Feature Selection Sequential Feature Selection (SFS)
New in the Scikit-Learn Version 0.24, Sequential Feature Selection or SFS is a greedy algorithm to find the best features by either going forward or backward based on the cross-validation score an estimator.
According to Scikit-Learn, SFS-Forward made a feature selection by starting with zero feature and find the one feature that maximizes a cross-validated score when a machine learning model is trained on this single feature. Once that first feature is selected, the procedure is repeated by adding a new feature to selected features. The procedure is stopped when we find the desired number of features is reached.
SFS-Backward follows the same idea but works in the opposite direction: It starts with all the features and greedily removes all the features until it reached the desired number of features.
SFS differs from RFE and SelectFromModel because the machine learning model did not need the coef_
or feature_importances_
attribute. Although, it is considerably slower as it evaluated the result by fitting the model multiple times.
Let’s try it with an example. I want to try SFS-Backward for an example.
from sklearn.feature_selection import SequentialFeatureSelector
#Selecting the Best important features according to Logistic Regression
sfs_selector = SequentialFeatureSelector(estimator=LogisticRegression(), n_features_to_select = 3, cv =10, direction ='backward')
sfs_selector.fit(X, y)
X.columns[sfs_selector.get_support()]

Using the SFS-Backward with three features to select and ten cross-validations, we end up with ‘pclass’, ‘age’, and ‘parch’ features.
You could experiment with this feature selection and see how your model performance is, but remember that with a higher number of features and data, your selection time would getting higher as well.
Conclusion
Feature Selection is an important aspect of the machine learning model, as we did not want to have many features that did not affect our predictions model at all.
In this article, I have shown you 5 Scikit-Learn Feature Selection methods you could use, they are:
- Variance Threshold Feature Selection
- Univariate Selection using SelectKBest
- Recursive Feature Elimination or RFE
- SelectFromModel
- Sequential Feature Selection or SFS
I hope it helps!
Visit me on my Social Media.
If you are not subscribed as a Medium Member, please consider subscribing through my referral.