Hyperparameter Tuning with Informed Searching

Published in

Towards Data Science

4 min readMar 11, 2020

Whew…it’s been a few weeks, but glad to be caught up! The goal for these posts is to hopefully find something that was new to me so I can ‘pay it forward’. Many of you have probably heard of GridSearchCV and perhaps even RandomSearchCV, but what about informed searching?

The advantage of using an informed searching technique is that hyperparameters are tuned through sequential learning. GridSearch and RandomSearch are great, but they only go one layer deep. Informed searching learns from prior hyperparameter tuning to optimize the tuning process. There are three methods I was made aware of, feel free to share any other methods!

Coarse to Fine Tuning

This would be the most obvious way to fine tune hyperparameters. There are only four steps:

Perform a RandomSearch (or GridSearch).
Review the results.
Define new parameter ranges based on result review.
Continue until the optimal scored is obtained.

Now, this is pretty obvious, but the easiest to implement without installing any packages. It would also be the most time consuming.

Bayesian Tuning

Yes, Bayes to the rescue again! This method will take a range of hyperparameters and utilize Bayesian belief principles to iterate through hyperparameters to provide the best result. The Hyperopt package provides everything you need! The Tree of Parzen Estimators (tpe.suggest) is the algorithm in the function. Also, an objective function is created to iterate through the parameters and measure loss.

An example is below:

from hyperopt import hp, space, fmin, tpe
# Set up space dictionary with specified hyperparameters
space = {'max_depth': hp.quniform('max_depth', 2, 10, 2),'learning_rate': hp.uniform('learning_rate', 0.001,0.9)}

  # Set up objective function
def objective(params):
    params = {'max_depth': int(params['max_depth']),'learning_rate': params['learning_rate']}    # model can be set - scoring must be 'accuracy'
    gbm_clf = GradientBoostingClassifier(n_estimators=100, **params) 
      best_score = cross_val_score(gbm_clf, X_train, y_train, scoring='accuracy', cv=2, n_jobs=4).mean()
    loss = 1 - best_score
    return loss# Run the algorithm - test max evals
best = fmin(fn=objective,space=space, max_evals=20, rstate=np.random.RandomState(42), algo=tpe.suggest)
  print(best)#Sample output:
 0%|          | 0/20 [00:00<?, ?it/s, best loss: ?]
  5%|5         | 1/20 [00:00<00:04,  4.16it/s, best loss: 0.26759418985474637]
 10%|#         | 2/20 [00:00<00:04,  4.32it/s, best loss: 0.2549063726593165] 
 15%|#5        | 3/20 [00:00<00:03,  4.60it/s, best loss: 0.2549063726593165]
 20%|##        | 4/20 [00:00<00:03,  4.82it/s, best loss: 0.2549063726593165]
 25%|##5       | 5/20 [00:01<00:04,  3.64it/s, best loss: 0.2549063726593165]
 30%|###       | 6/20 [00:01<00:03,  3.71it/s, best loss: 0.2549063726593165]
 35%|###5      | 7/20 [00:01<00:03,  4.09it/s, best loss: 0.2549063726593165]
 40%|####      | 8/20 [00:01<00:02,  4.29it/s, best loss: 0.2549063726593165]
 45%|####5     | 9/20 [00:02<00:02,  4.49it/s, best loss: 0.2549063726593165]
 50%|#####     | 10/20 [00:02<00:02,  4.69it/s, best loss: 0.2549063726593165]
 55%|#####5    | 11/20 [00:02<00:01,  4.77it/s, best loss: 0.2549063726593165]
 60%|######    | 12/20 [00:02<00:01,  4.53it/s, best loss: 0.2549063726593165]
 65%|######5   | 13/20 [00:03<00:01,  4.16it/s, best loss: 0.2549063726593165]
 70%|#######   | 14/20 [00:03<00:02,  2.81it/s, best loss: 0.2525688142203555]
 75%|#######5  | 15/20 [00:03<00:01,  3.29it/s, best loss: 0.2525688142203555]
 80%|########  | 16/20 [00:04<00:01,  3.57it/s, best loss: 0.2525688142203555]
 85%|########5 | 17/20 [00:04<00:01,  2.41it/s, best loss: 0.24246856171404285]
 90%|######### | 18/20 [00:05<00:00,  2.41it/s, best loss: 0.24246856171404285]
 95%|#########5| 19/20 [00:05<00:00,  2.46it/s, best loss: 0.24246856171404285]
100%|##########| 20/20 [00:05<00:00,  2.69it/s, best loss: 0.24246856171404285]
100%|##########| 20/20 [00:05<00:00,  3.40it/s, best loss: 0.24246856171404285]
{'learning_rate': 0.11310589268581149, 'max_depth': 6.0}

Genetic Tuning

Finally, we will take a look at genetic tuning. This is the most interesting concept as it follows Darwin’s evolution process:

Different species (different models)
The strongest survive (best scoring is picked)
Reproduce (create new models similar to the best ones)
Genetic randomness occurs during reproduction (add randomness so local optimum isn’t reached)
Repeat

This process can be performed with the TPOT package. With TPOT, you can set all of these ‘genetic’ arguments:

generations — # of cycles
population_size = # of models to keep
offspring_size = # of offspring in each
mutation_rate = proportion of pipelines to apply randomness
crossover_rate = proportion of pipelines to breed each iteration

TPOT is built on several libraries, so make sure to check out the documentation to ensure a proper install. (there is a link attached to the first mention of TPOT)

Here is an example of TPOT in action:

# Assign the values outlined to the inputs
number_generations = 3
population_size = 4
offspring_size = 3
scoring_function = 'accuracy'

# Create the tpot classifier
tpot_clf = TPOTClassifier(generations=number_generations, population_size=population_size,
                            offspring_size=offspring_size, scoring=scoring_function,
                            verbosity=2, random_state=2, cv=2)# Fit the classifier to the training data
tpot_clf.fit(X_train, y_train)

# Score on the test set
print(tpot_clf.score(X_test, y_test))#sample output:
    Generation 1 - Current best internal CV score: 0.7549688742218555
    Generation 2 - Current best internal CV score: 0.7549688742218555
    
    Best pipeline: DecisionTreeClassifier(input_matrix, criterion=gini, max_depth=7, min_samples_leaf=11, min_samples_split=12)
    0.75

There you have it! Three (well basically two) new processes to implement in your search for the best hyperparameters.

These processes don’t just magically appear, I do get many of these processes through DataCamp lessons. DataCamp is a great way to stay in practice, learn new techniques, reinforce what you already learn and find new programs to master. It comes highly recommended.

Take care and see you on the next one!

Hyperparameter Tuning with Informed Searching

Coarse to Fine Tuning

Bayesian Tuning

Genetic Tuning

Written by The Data Detective