The world’s leading publication for data science, AI, and ML professionals.

3 Different Ways to Tune Hyperparameters (Interactive Python Code)

An Interactive Guide to Optimizing and Tuning Hyperparameters

Photo by chuttersnap on Unsplash
Photo by chuttersnap on Unsplash

Hyperparameters (HP) are those parameters that machine or Deep Learning algorithms cannot learn by training. The value of those parameters needs to be set before the training process and it can control how the algorithm learns from the data. Hyperparameters (HP) can be the number of layers or a neural network or number of nodes in a layer, it can be the learning rate or the type of optimizer that the neural network can use. Getting the right hyperparameters can be counter-intuitive, it is more of a trial and error process. Hence, we have the need to find a way to automatically tune those parameters to get the best combination that will maximize the ML model performance.

In this article, I will discuss a few ways to optimize and tune the hyperparameters (HP) to either maximize or minimize the objective metrics (Maximize when the metric is accuracy and minimize if the metric is RMSE or any other loss function).

But before we dive deeper into the hyper-parameters optimization part, let’s cover some important concepts:

Objective Function:

Machine Learning can be simply defined as learning behaviour from experience. By learning here, we mean getting better and improving at some certain task over time. But what constitutes an improvement?

In order to develop a function to judge the improvement, we need to have formal measures of how good or bad our models are. In machine learning, we call these "objective functions". The most common objective function is squared error which is basically the difference between the predicted value and the actual ground truth. If this value is large, then the accuracy is low. Our target from optimizing the hyperparameters will be to minimize this objective function. Getting the right combination of the model’s hyperparameters can contribute largely into the model’s performance, Let’s have a look at how we can tune those parameters to minimize the objective function:

1- Grid Search:

One way of tuning the hyperparameters is to define an equal range for each HP and have the computer try all possible combinations of parameter values. It can be a great approach to get the best combination of HPs. It works by putting the HPs that you want to tune in an n-dimensional grid then try each and every combination in this grid. You can either use nested loops to achieve that or you can use some ready-made libraries such as Scikit Learn – GridSearchCV. Let’s have a look at how it works, the example below will try to find the best combination of hyperparameters (C, Gamma and Kernel) to build a classifier for the MNIST dataset:

(Interactive Code – Run the code yourself by clicking on the play button)

The code above will brute force and try every possible combination of those three hyper-parameters then print out the best combination.

  • Pros: This method will get the best hyper-parameter.
  • Cons: This is an exhaustive operation. If the hyper-parameters range or number is high, the possibilities can be in millions and it will take so much time to finish. It also follows a sequence without considering past experience.

2- Random Search:

As the name suggests, it will try to randomly select a pre-defined number of combination of hyper-parameters that will give some indication on what would be the best hyper-parameters. Now, let’s see some code:

(Interactive Code – Run the code yourself by clicking on the play button)

As you can see in the code, you can define the number of times or iterations you want to pick a random set of hyper-parameters then try them out and get the best combination.

Pros: Much faster than the Grid Search method.

Cons: May not return the best combination of hyper-parameters that would return the best accuracy. It also doesn’t consider past evaluations and it will continue the iterations regardless of the results.

3- Bayesian Optimization:

In the above examples, we had an objective metric which is the accuracy and we also had an objective function that tries to maximize the metric and minimize the loss. Bayesian Optimization approach tries to find the value that minimizes an objective function by building a probability model based on past evaluation results of the objective metric. It tries to find the balance between the exploration of the best region that contains the best hyperparameters and the exploitation of this region to maximize/minimize the objective metric.

The problem of optimizing hyper-parameters is that it is an expensive process to assess the performance of a set of HPs because we have to build the corresponding graphs or neural networks in each iteration, then we have to train it, and finally, we have to assess the performance. The optimization process can take hours or days but in this example, we will train on only 3 epochs so that you can see the best HPs and then use those to train on more epochs.

Let’s have a look at the example below that uses Bayesian Optimization to tune a neural network for the MNIST dataset. We will use scikit-optimize library to perform the HPO. It’s one of the libraries that has implemented the Bayesian Optimization algorithm:

(Interactive Code – Run the code yourself by clicking on the play button)

Conclusion:

Optimizing the hyperparameters for machine learning models is vital to the performance of the machine learning models and the process of tuning the HPs is not intuitive and can be a complex task. The article above shows three popular ways to automatically tune these parameters. We have also explored the scikit-optimize (skopt) library which is still under development at the time of writing this article, but it is an extremely powerful tool. It has a very neat implementation of the Bayesian Optimization approach, which is vastly superior to Grid Search and Random Search especially when it comes to complex tasks and a large number of hyper-parameters.


Related Articles