Table of Contents
1. Introduction
2. Grid Search without Sklearn Library
3. Grid Search with Sklearn Library
4. Grid Search with Validation Dataset
5. Grid Search with Cross-Validation
6. Customized Grid Search
7. Different Cross-Validation types in Grid Search
8. Nested Cross-Validation
9. Summary

1. Introduction
The model and the preprocessing are individual for each project. Hyperparameters are tuned according to the dataset and using the same hyperparameters for each project compromises the accuracy of the results. For example, there are different hyperparameters such as ‘solver’, ‘C’, ‘penalty’ in the Logistic Regression algorithm, and different combinations of these give different results. Similarly, there are adjustable parameters for Support Vector Machine such as gamma value, C value, and combination of them also gives different results. These hyperparameters of the algorithms are available on the sklearn website. The goal of the developer is to design a model with generalizing as well as high accuracy so, detection of the best combination of hyperparameters is important for improving accuracy.
This article involves evaluating all combinations of hypermeters to improve the accuracy of the model and the reliability of the resulting accuracy.
2. Grid Search without Sklearn Library
Combinations that are requested to be evaluated by the user are tested with the Gridsearchcv in the Sklearn library. In fact, the model fits each combination individually, revealing the best result and parameters. For example, when we consider LogisticRegression, if 4 different values are selected for C and 2 different values are selected for penalty, the model will fit 8 times and the result of each will represent. Now let’s create a grid search on the cancer dataset without using the sklearn library:
IN[1]
cancer=load_breast_cancer()
cancer_data =cancer.data
cancer_target =cancer.target
IN[2]
x_train,x_test,y_train,y_test=train_test_split(cancer_data,cancer_target,test_size=0.2,random_state=2021)
best_lr=0
for C in [0.001,0.1,1,10]:
for penalty in ['l1','l2']:
lr=LogisticRegression(solver='saga',C=C,penalty=penalty)
lr.fit(x_train,y_train)
lr_score=lr.score(x_test,y_test)
print("C: ",C,"penalty:",penalty,'acc {:.3f}'.format(lr_score))
if lr_score>best_lr:
best_lr=lr_score
best_lr_combination=(C,penalty)
print("best score LogisticRegression",best_lr)
print("C and penalty",best_lr_combination)
OUT[2]
C: 0.001 penalty: l1 acc:0.912
C: 0.001 penalty: l2 acc:0.895
C: 0.1 penalty: l1 acc:0.904
C: 0.1 penalty: l2 acc:0.904
C: 1 penalty: l1 acc:0.895
C: 1 penalty: l2 acc:0.904
C: 10 penalty: l1 acc:0.904
C: 10 penalty: l2 acc:0.904
best score LogisticRegression 0.9122807017543859
C and penalty (0.001, 'l1')
All hyperparameters and more for Logistic Regression can be accessed from this link.
As can be seen, the accuracy value was created for each combination. The developer can improve the accuracy of the model by choosing the best combination of hyperparameters. OUT[2] indicates the best combination of hyperparameters is C=0.001 and penalty=L1.
Let’s create the same process using Support Vector Classifier and Decision Tree Classifier.
IN[3]
#SVC
best_svc=0
for gamma in [0.001,0.1,1,100]:
for C in[0.01,0.1,1,100]:
svm=SVC(gamma=gamma,C=C)
svm.fit(x_train,y_train)
score=svm.score(x_test,y_test)
#print("gamma:",gamma,"C:",C,"acc",score)
if score>best_svc:
best_svc=score
best_svc_combination=(gamma, C)
print("best score SVM",best_svc)
print("gamma and C",best_svc_combination)
OUT[3]
best score SVM 0.9210526315789473
gamma and C (0.001, 100)
IN[4]
#DT
best_dt=0
for max_depth in [1,2,3,5,7,9,11,13,15]:
dt = DecisionTreeClassifier(max_depth=max_depth, random_state=2021)
dt.fit(x_train,y_train)
dt_score=dt.score(x_test,y_test)
#print("max_depth:",max_depth,dt_score)
if dt_score>best_dt:
best_dt=dt_score
best_dt_depth=(max_depth)
print("best dt_score:",best_dt)
print("best dt depth:", best_dt_depth)
OUT[4]
best dt_score: 0.9473684210526315
best dt depth: 3
All hyperparameters and more for Decision Tree Classifier can be accessed from this [link](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) and for Support Vector Classifier from this link.
OUT[3] indicates the best combination for SVC is gamma=0.001 and C=100. OUT[4] indicates the best combination for DTC is max_depth=3.
3. Grid Search with Sklearn Library
Let’s do the same using the sklearn library:
IN[5]
param_grid_lr = {'C': [0.001,0.1,1,10],'penalty': ['l1','l2']}
gs_lr=GridSearchCV(LogisticRegression(solver='saga'),param_grid_lr)
x_train,x_test,y_train,y_test=train_test_split(cancer_data,
cancer_target,test_size=0.2,random_state=2021)
gs_lr.fit(x_train,y_train)
test_score=gs_lr.score(x_test,y_test)
print("test score:",test_score)
print("best combination: ",gs_lr.best_params_)
print("best score: ", gs_lr.best_score_)
print("best all parameters:",gs_lr.best_estimator_)
print("everything ",gs_lr.cv_results_)
OUT[5]
test score: 0.9122807017543859
best combination: {'C': 0.001, 'penalty': 'l1'}
best score: 0.9054945054945055
best all parameters: LogisticRegression(C=0.001, penalty='l1', solver='saga')
The dataset is split with the _train_testsplit as above. The training dataset has been trained with a Logistic Regression algorithm with various combinations of hyperparameters by using GridSearchCV. It is seen that the accuracy rate and the best parameters are the same as above. GridSearchCV has a lot of attributes and all of these are available on the sklearn website.
4. Grid Search with Validation Dataset
In previous studies, data were separated as test set and training set. The training dataset was tried with all combinations and the highest rate was applied to the test dataset. However, it is explained in this link that splitting _train_testsplit with random is a gamble and may not give reliable results. Now, after separating the data as training set and test set to increase reliability, let’s separate the training dataset as training set and validation set. Let’s train the model with a training dataset and evaluate with validation data, and after determining the most suitable hyperparameters for the model, apply it to the test dataset that was originally allocated. Even if the accuracy value is lower, the model would be more generalized. This is preferable to the fake high accuracy.
IN[6]
x_valtrain,x_test,y_valtrain,y_test=train_test_split(cancer_data,
cancer_target,test_size=0.2,random_state=2021)
x_train,x_val,y_train,y_val=train_test_split(x_valtrain,y_valtrain,
test_size=0.2,random_state=2021)
param_grid_lr = {'C': [0.001,0.1,1,10],'penalty': ['l1','l2']}
gs_lr=GridSearchCV(LogisticRegression(solver='saga'),param_grid_lr)
gs_lr.fit(x_train,y_train)
val_score=gs_lr.score(x_val,y_val)
print("val score:",val_score)
print("best parameters: ",gs_lr.best_params_)
print("best score: ", gs_lr.best_score_)
new_lr=LogisticRegression(solver='saga', C=0.001, penalty='l2').fit(x_valtrain,y_valtrain)
test_score=new_lr.score(x_test,y_test)
print("test score", test_score)
OUT[6]
val score: 0.9010989010989011
best parameters: {'C': 0.001, 'penalty': 'l2'}
best score: 0.9092465753424659
test score 0.9035087719298246
It was trained with a training dataset(x_train, y_train), evaluated with a validation dataset(x_val, y_val), and the best combination was determined. Then, the new model was created with the best combination was fit with the train dataset +validation dataset, more data was used, and finally, it was evaluated with the test dataset which is allocated in the first split.
The same process can be done without using sklearn or by following the template above for other algorithms.

5. Grid Search with Cross-Validation
The dataset is divided into a training set and a test set. Separation of the training dataset as training set + validation set is done by cross-validation. Let’s implement it without using the sklearn library to understand the system:
IN[7]
x_valtrain,x_test,y_valtrain,y_test=train_test_split(cancer_data,
cancer_target,test_size=0.2,random_state=2021)
best_lr=0
for C in [0.001,0.1,1,10]:
for penalty in ['l1','l2']:
lr=LogisticRegression(solver='saga',C=C,penalty=penalty)
cv_scores=cross_val_score(lr,x_valtrain,y_valtrain,cv=5)
mean_score=np.mean(cv_scores)
if mean_score>best_lr:
best_lr=mean_score
best_lr_parameters=(C,penalty)
print("best score LogisticRegression",best_lr)
print("C and penalty",best_lr_parameters)
print("**************************************")
new_cv_lr=LogisticRegression(solver='saga',C=0.001,penalty='l1').fit(x_valtrain,y_valtrain)
new_cv_score=new_cv_lr.score(x_test,y_test)
print('test accuracy:',new_cv_score)
OUT[7]
best score LogisticRegression 0.9054945054945055
C and penalty (0.001, 'l1')
**************************************
test accuracy: 0.9122807017543859
The _xvaltrain(train+validation) dataset is split with the value CV=5 and the test data originally allocated is applied to the reconstructed model with the best parameters.
The same process can be applied with the sklearn library:
IN[8]
param_grid_lr = {'C': [0.001,0.1,1,10,100],'penalty': ['l1','l2']}
gs_lr=GridSearchCV(LogisticRegression(solver='saga'),param_grid_lr,
cv=5)
x_valtrain,x_test,y_valtrain,y_test=train_test_split(cancer_data,
cancer_target,test_size=0.2,random_state=2021)
gs_lr.fit(x_valtrain,y_valtrain)
gs_lr_score=gs_lr.score(x_test,y_test)
print('test acc:',gs_lr_score)
print("best parameters: ",gs_lr.best_params_)
print("best score: ", gs_lr.best_score_)
OUT[8]
test acc: 0.9122807017543859
best parameters: {'C': 0.001, 'penalty': 'l1'}
best score: 0.9054945054945055
best all parameters LogisticRegression(C=0.001, penalty='l1', solver='saga')
It is seen that the same results and the same best parameters were obtained.
6. Customized Grid Search
A combination of parameters is possible as long as it is allowed. Some parameters cannot be combined with each other. For example, when solver:’saga’ is selected in LogisticRegression, ‘L1’, ‘L2’ and ‘elasticnet’ can be applied, however for solver:‘lbfgs’, the only penalty: ‘L2’ can be applied (or ‘none’). It is possible to overcome this disadvantage with GridSearch as follows:
IN[9]
x_valtrain,x_test,y_valtrain,y_test=train_test_split(cancer_data,
cancer_target,test_size=0.2,random_state=2021)
param_grid_lr=[{'solver':['saga'],'C':[0.1,1,10],'penalty':['elasticnet','l1','l2']},
{'solver':['lbfgs'],'C':[0.1,1,10],'penalty':['l2']}]
gs_lr = GridSearchCV(LogisticRegression(),param_grid_lr,cv=5)
gs_lr.fit(x_valtrain,y_valtrain)
gs_lr_score=gs_lr.score(x_test,y_test)
print("test score:",gs_lr_score)
print("best parameters: ",gs_lr.best_params_)
print("best score: ", gs_lr.best_score_)
OUT[9]
test score: 0.9210526315789473
best parameters: {'C': 1, 'penalty': 'l2', 'solver': 'lbfgs'}
best score: 0.9516483516483516
The new model was created according to the selected best parameters and the test data was applied by GridSearchCV.
7. Different Cross-Validation types in Grid Search
Cross-validation has been implemented as k-fold so far, but it is also possible to apply different cross-validation methods:
IN[10]
x_valtrain,x_test,y_valtrain,y_test=train_test_split(cancer_data,
cancer_target,test_size=0.2,random_state=2021)
param_grid_lr=[{'solver':['saga'],'C':[0.1,1,10],'penalty':['elasticnet','l1','l2']},
{'solver':['lbfgs'],'C':[0.1,1,10],'penalty':['l2']}]
IN[11]
gs_lr_loo = GridSearchCV(LogisticRegression(),param_grid_lr,cv=LeaveOneOut())
gs_lr_loo.fit(x_valtrain,y_valtrain)
gs_lr_loo_score=gs_lr_loo.score(x_test,y_test)
print("loo-test score:",gs_lr_loo_score)
print("loo-best parameters: ",gs_lr_loo.best_params_)
print("**********************************************")
OUT[11]
loo-test score: 0.9122807017543859
loo-best parameters: {'C': 0.1, 'penalty': 'l2', 'solver': 'lbfgs'}
**********************************************
IN[12]
skf = StratifiedKFold(n_splits=5)
gs_lr_skf = GridSearchCV(LogisticRegression(),param_grid_lr,cv=skf)
gs_lr_skf.fit(x_valtrain,y_valtrain)
gs_lr_skf_score=gs_lr_skf.score(x_test,y_test)
print("skf-test score:",gs_lr_skf_score)
print("skf-best parameters: ",gs_lr_skf.best_params_)
print("**********************************************")
OUT[12]
skf-test score: 0.9210526315789473
skf-best parameters: {'C': 1, 'penalty': 'l2', 'solver': 'lbfgs'}
**********************************************
IN[13]
rkf = RepeatedKFold(n_splits=5, n_repeats=5, random_state=2021)
gs_lr_rkf= GridSearchCV(LogisticRegression(),param_grid_lr,cv=rkf)
gs_lr_rkf.fit(x_valtrain,y_valtrain)
gs_lr_rkf_score=gs_lr_rkf.score(x_test,y_test)
print("rkf-test score:",gs_lr_rkf_score)
print("rkf-best parameters: ",gs_lr_rkf.best_params_)
print("**********************************************")
OUT[13]
rkf-test score: 0.9298245614035088
rkf-best parameters: {'C': 10, 'penalty': 'l2', 'solver': 'lbfgs'}
**********************************************
While the highest accuracy value was obtained with RepeatedKFold with C=10 and penalty=L2, it was determined that the C values of all the other results with accuracy values close to it were different.
8. Nested Cross-Validation
Test data has been separated by _train_testsplit and training data has been split into a training set and validation set with cross-validation so far. To further generalize this method, we can also split the test data with cross-validation:
IN[14]
param_grid_lr=[{'solver':['saga'],'C':[0.1,1,10],'penalty':['elasticnet','l1','l2']},
{'solver':['lbfgs'],'C':[0.1,1,10],'penalty':['l2']}]
gs=GridSearchCV(LogisticRegression(),param_grid_lr,cv=5)
nested_scores=cross_val_score(gs,cancer.data,cancer.target,cv=5)
print("nested acc",nested_scores)
print("Average acc: ", nested_scores.mean())
OUT[14]
nested acc [0.94736842 0.93859649 0.94736842 0.9122807 0.92920354]
Average acc: 0.9349635149821456
For solver=’saga’, there are 3×3 combinations and for solver=’lbfgs’, there are 3×1 combinations. Model is fit 5 times in inner cross-validation and 5 times in outer cross-validation as well.
So, the total fit number of the model is 9x5x5 + 3x5x5 = 300.
9. Summary

The downside of grid search and cross-validation is that it takes a long time to fit dozens of models. The n_jobs value can be set by the user, and how many CPU cores to use can be assigned. If n_jobs=-1 is set, all available CPU cores are used.