Why fine-tuning models?
Machine learning is not always straightforward as in Iris, Titanic or Boston House Pricing datasets.
But, hey, don’t get me wrong. I learned (and keep learning) a lot from those famous toy datasets. The great advantage of them is that they don’t require too much of exploration or preprocessing. Many times, we can go right to the point we want to practice and learn, like pipelines, modeling, model tuning, visualization etc.
I guess what I am trying to say is that, when modeling data, it will not be as easy as the toy datasets we use for studying. Real data needs to be adjusted, fitted, and have the model fine tuned, so we get the best out of the algorithm. For that purpose, two good options are the Gridsearchcv
and RandomizedSearchCV
from Scikit-Learn.
Well, maybe what brought you to this post is the need of making your predictions better by choosing the right hyperparameters to your model. So, the two options presented in this quick tutorial will allow us to provide the modeling algorithm with a list of hyperparameters. It will combine the options one by one, testing many different models, and then presenting us with the best option, the one with the best performance.
Awesome, isn’t it? So let’s move on to learn the difference between them.
The difference
To illustrate the concept with an easy analogy, let’s imagine that we’re going to a party and we want to select the best outfit combination. We take a couple of shirts, a few pants, and a handful of shows to wear.
If we are the GridSearchCV, we will try every combination of shirt, pants and shoes available, look at the mirror and take a picture. At the end, we will look at everything and take the best option.
If we are the RandomizedSearchCV, we will try some of the combinations that are randomly picked, take a picture and choose the best performer at the end.
Now, with this analogy, I believe you can sense that the Grid Search will take more time as we increase the number of outfits to try. If it’s just two shirts, one pant and one pair of shoes, it won’t take long. But if that’s 10 shirts, 5 pants and 4 different pairs of shoes, well… you got the idea. But, on the other hand, it will have a picture of everything, so it is very complete set of options to choose from.
Randomized search won’t take long, because it will try just a few randomly picked combinations. So, if your grid of options is small, it does not make sense using it. The time to train all the options or just a couple of them is pretty much the same. But when you have a lot of combinations to try, it may make more sense. But have in mind that this option won’t have all the options tried, so the true "best estimator" may not even be tried.
Let’s see them in action now.
Coding
Let’s get to the coding section. We will begin importing the necessary modules for this exercise.
# Imports
import pandas as pd
import numpy as np
import seaborn as sns
# Dataset
from sklearn.datasets import make_regression
# sklearn preprocess
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
# Search
from sklearn.model_selection import Randomizedsearchcv, GridSearchCV
Next, we can create a regression dataset.
# Dataframe
df = make_regression(n_samples=2000, n_features=5,
n_informative=4, noise=1, random_state=12)
# Split X and y
X= df[0]
y= df[1]
We can split in train and test.
# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random)
And let’s create a pipeline that will scale the data and fit a Decision Tree model.
# Creating the steps for the pipeline
steps = [ ('scale', StandardScaler()),
('model', DecisionTreeRegressor()) ]
# Creating pipeline for Decision Tree Regressor
pipe = Pipeline(steps)
# Fit the model
pipe.fit(X_train, y_train)
The next step is to create a grid of hyperparameters params
to be tested to fine-tune the model. There are (2 x 3 x 2 = 12) options here to be tested.
%%timeit
# Creating dictionary of parameters to be tested
params= {'model__max_features': [2,5], 'model__min_samples_split':[2, 5, 10], 'model__criterion': ['friedman_mse', 'absolute_error']}
# Applying the Grid Search
grid = GridSearchCV(pipe, param_grid=params, cv=5, scoring='neg_mean_squared_error')
grid.fit(X_train, y_train)
# Best model
grid.best_estimator_
The time result is as follows. 2.37 seconds by loop. The total time was around 18 seconds. That is good.
2.37 s ± 526 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
But what happens if we increase the quantity of options to test? Let’s try (4 x 6 x 2 = 48) options.
%%timeit
# Creating dictionary of parameters to be tested
params= {'model__max_features': [2,3,4,5], 'model__min_samples_split':[2,5,6,7,8,10],'model__criterion': ['friedman_mse', 'absolute_error']}
# Applying the Grid Search
grid = GridSearchCV(pipe, param_grid=params, cv=5, scoring='neg_mean_squared_error')
grid.fit(X_train, y_train)
# Best model
grid.best_estimator_
The time has increased a lot. 6.93 seconds per loop. The total time here was over 1 minute.
6.93 s ± 505 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Now let’s see the Randomized Search. First we will try same the first run, with 12 options.
%%timeit
# Creating dictionary of parameters to be tested
params= {'model__max_features': [2,5],'model__min_samples_split':[2, 5, 10],'model__criterion': ['friedman_mse', 'absolute_error']}
# Applying the Grid Search
randcv = RandomizedSearchCV(pipe, param_distributions=params, cv=5, scoring='neg_mean_squared_error')
randcv.fit(X_train, y_train)
# Best model
randcv.best_estimator_
The time is lower than the Grid Search, as expected. 1.47 sec per loop and around 10 seconds total to run.
1.47 s ± 140 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If we increase the amount of options in the grid, let’s see what happens.
%%timeit
# Creating dictionary of parameters to be tested
params= {'model__max_features': [2,3,4,5],
'model__min_samples_split':[2,5,6,7,8,9,10],
'model__criterion': ['friedman_mse', 'absolute_error']}
# Applying the Grid Search
randcv = RandomizedSearchCV(pipe, param_distributions=params, cv=5, scoring='neg_mean_squared_error')
randcv.fit(X_train, y_train)
# Best model
randcv.best_estimator_
Here is the result. Wow, almost the same time! 1.46 sec per loop.
1.46 s ± 233 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Well, that is very nice. But do they give us similar results? Let’s see next.
Results
Let’s assess the results now from GridSearchCV
and RandomizedSearchCV
.
Calculating the RMSE for the Grid Search.
# Taking the best estimator
best_grid = grid.best_estimator_
# Predict
preds_grid = best_grid.predict(X_test)
# RMSE
np.sqrt( mean_squared_error(y_test, preds_grid) )
[OUT]:
53.70886778489411
Calculating the RMSE for the Randomized Search.
# Taking the best estimator
best_rand = randcv.best_estimator_
# Predict
preds_rand = best_rand.predict(X_test)
# RMSE
np.sqrt( mean_squared_error(y_test, preds_rand) )
[OUT]:
55.35583215782757
Well, a 3% difference in the result. The grid search got the best result because it trains every model, thus, it will find the best fit. The trade-off is the time to train when you’re trying too many combinations. Randomized Search can be a good option in this case.
Before You Go
In this post, we wanted to show two good options for fine tuning models.
You can use GridSearchCV
when you need to account for every optimization possible. But take in consideration the time to train the models. If you have an idea of which hyperparameters to choose, this one can be your best option.
When you have too many combinations of hyperparameters to choose from, the RandomizedSearch
is probably the best choice. For example, you can run it and get the best estimator just to point you in the right direction of which combination to start from when using the Grid Search.
If you liked this post, follow my blog for more or find me on Linkedin.