The world’s leading publication for data science, AI, and ML professionals.

Sparse Group Lasso in Python

A tutorial on how to use one of the best variable selection alternatives

How to use one of the best variable selection techniques in regression

Preparing to use LASSO and catch some meaningful variables. Photo by Priscilla Du Preez on Unsplash
Preparing to use LASSO and catch some meaningful variables. Photo by Priscilla Du Preez on Unsplash

So im here to talk about the wonderful asglpackage (the name comes from Adaptive Sparse Group Lasso) that adds a lot of features that were already available in R packages but not in Python, like solving sparse group lasso models, and goes beyond that, adding extra features that improve the results that sparse group lasso can provide.

I would like to start talking about the sparse group lasso: what is it and how to use it. Specifically, here we will see:

  • What is sparse group lasso
  • How to use sparse group lasso in python
  • How to perform k-fold cross validation
  • How to use grid search in order to find the optimal solution.

What is sparse group lasso

To understand what is sparse group lasso we need to talk (briefly) about two techniques: lasso and group lasso. Given a risk function, for example the linear Regression risk,

Risk function of a linear regression model
Risk function of a linear regression model

Lasso: is defined by adding a penalization on the absolute value of the β coefficients,

Lasso penalty formula
Lasso penalty formula

This definition provides sparse solutions, because it will send to zero some of the β coefficients (the least related with the response variable). The effect of this penalization can be controlled using the λ parameter. A large λ value provides solutions where the penalization has a greater importance, and thus there are more zeros among the β coefficients. This is mainly useful in high dimensional datasets, where there are more variables than observations but we only expect a small fragment of the variables to be truly meaningful.

However, there are situations in which the predictor variables in X have a natural grouped structure. For example, in biostatistics it is common to deal with genetic datasets in which predictors are grouped into genetical pathways. In stock market analysis one can group companies from the same business segment. In climate data one can group different regions… And lasso provides individual sparse solutions, not group sparse.

Group lasso: So here comes **** group lasso to the rescue. Group lasso is built as the sum of squares of coefficients belonging to the same group.

Group lasso penalty formula
Group lasso penalty formula

This way it takes into account the possible grouped structure of predictors, and it sends to zero whole groups of variables. If all the groups are of size 1 (only one predictor per group) we will be solving a lasso model. Lets see lasso and group lasso graphically,

Lasso, group lasso and ridge penalizations comparison
Lasso, group lasso and ridge penalizations comparison

In the image above we have a simple problem with three coefficients, β₁ β₁₁ and β₁₂. The last two coefficients form a group, and as we can see, lasso (left image) does not take this grouping information into account, but group lasso does. So group lasso can be seen as lasso between groups and ridge within groups. If a group is meaningful, we select the whole group. If it is not, we send it to zero.

Sparse group lasso: and finally here it is,

Sparse group lasso penalty function
Sparse group lasso penalty function

Sparse group lasso is a linear combination between lasso and group lasso, so it provides solutions that are both between and within group sparse.

This technique selects the most meaningful predictors from the most meaningful groups, and is one of the best variable selection alternatives of recent years. However, there was no implementation of sparse group lasso for python… until now.

Moving to python: install asgl

Lets start by installing asgl. This can be easily done using pip

pip install asgl

Or alternatively, one can pull the github repository and run the setup.py

git clone https://github.com/alvaromc317/asgl.git
cd asgl
python setup.py

Import libraries

Once we have the package installed, we can start using it. First, let’s create some data to analise.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from asgl import Regressor

X, y = make_regression(n_samples=1000, n_features=10, bias=10, noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=250)
group_index = np.array([1, 1, 2, 2, 3, 3, 4, 4, 4, 4])

Here, in addition to importing the dataset, we have created a variable called group_index . This variable describes the group structure of the data, so if we have 10predictors, group_index should be a variable of length 10, and if the first two predictors form a group, they should have the same group_index value. However, our dataset does not have a natural grouped structure, so here we define a fake one just for the sake of this article.

Parameters for the sgl model

model = Regressor(model='lm', penalization='sgl', lambda1=0.1, alpha=0.5)
model.fit(X_train, y_train, group_index)

predictions = model.predict(X_test)
mse = mean_squared_error(predictions, y_test)

If we have a look to the sparse group lasso equation above, we can see that there are two parameters, α and λ, that can be optimized. λ controlls how much weight we want to give to the penalization, so larger λ values produce more sparse solutions. And α controls the tradeoff between lasso and group lasso. α equal to 1 provides a lasso, and α equal to 0 provides a group lasso. Now, usually, we can define a grid of possible values for both parameters and try to find the combination that minimizes the error.

Additionally, we specify the type of model to solve (lm, because we are solving a linear model), the penalization (sgl, because we want the sparse group lasso) and compute the mean squared error using scikit-learn’s functions.

Cross validation

We can define a grid of possible values for the hyperparameters α and λ, and find the optimal combination using cross validation from scikit-learn.

from sklearn.model_selection import GridSearchCV

model = Regressor(model='lm', penalization='sgl')

param_grid = {'lambda1': [1e-4, 1e-3, 1e-2, 1e-1, 1], 'alpha': [0, 0.2, 0.4, 0.6, 0.8, 1]}
gscv = GridSearchCV(model, param_grid, scoring='neg_median_absolute_error')
gscv.fit(X_train, y_train, **{'group_index': group_index})

So first, we define our model, then define the grid of possible values for the hyperparameters, and finally we initialize and run the `GridSearchCV“object that will perform cross validation on the grid of all the possible combinations of hyperparameters. This means that the function will analise all the 30 possible models (5 possible values for λ and 6 possible values for α).

We can finally see what our optimal model is by looking into

gscv.best_estimator_

As simple as that, we have found our optimal model.

And that’s it on how to implement sparse group lasso in python. I hope you enjoyed this post and found it useful, so stay tuned for future posts on this series and please do not hesitate on contacting me if you have any question / suggestion.

For a deeper review on what the asgl package has to offer, I recommend reading the jupyter notebook provided in the github repository.

Have a good day!


Related Articles