The Art of Hyperparameter Tuning in Python

Grid, Random, Coarse to Fine, Bayesian, Manual Search, and Genetic Algorithm!

Louis Owen
Towards Data Science

--

Let say you have prepared the data and developed the POC of your ML model,

and it works.

However, you realized that the performance is not really good.

Of course, you want to improve the performance of your model, but how?

Should you change the model?

That’s an option, but what if after you changed the model, you still can’t achieve the expected performance?

You have 2 options: engineer the model, or revisit the data.

The former is called the model-centric approach, while the latter is called the data-centric approach. In this article, we will learn the model-centric approach, especially the hyperparameter tuning part.

There are many articles and tutorials available out there on how to perform hyperparameter tuning in Python.

So, why I have to bother to spend my time reading just another typical article?

This article covers all useful hyperparameter tuning methods in one place. You will not only learn the concept of each of the hyperparameter tuning methods but also when you should use each of them.

Moreover, the explanation will also be accompanied by the relevant visualization that can help you understand better how each of the methods works.

There are 6 methods that will be discussed in this article:

  1. Grid Search
  2. Random Search
  3. Coarse to Fine Search
  4. Bayesian Search
  5. Genetic Algorithm
  6. Manual Search

No need to worry about the code implementation, because you can access it freely from the link attached at the end of this article!

Photo by Pablo Heimplatz on Unsplash

Without wasting any more time, let’s take a deep breath, make ourselves comfortable, and be ready to learn the art of hyperparameter tuning in Python!

Hyperparameter vs Parameter

Before we learn about the hyperparameter tuning methods, we should know what is the difference between hyperparameter and parameter.

The key difference between hyperparameter and parameter is where they are located relative to the model.

A model parameter is a configuration variable that is internal to the model and whose value can be estimated from data.

Example: coefficients in logistic regression/linear regression, weights in a neural network, support vectors in SVM

A model hyperparameter is a configuration that is external to the model and whose value cannot be estimated from data.

Example: max_depth in Decision Tree, learning rate in a neural network, C and sigma in SVM

Another important term that is also needed to be understood is the hyperparameter space. Basically, hyperparameter space is the space or all possible combinations of hyperparameters that can be tuned during hyperparameter tuning.

Hyperparameter Tuning

Now we know what is the difference between hyperparameter and parameter. Next, we should know why should we do hyperparameter tuning.

Basically, the goal of hyperparameter tuning is to get the optimal model’s performance.

How? by choosing the best combination of hyperparameters, because they can’t be estimated from the data.

But, how to choose the best combination of hyperparameters?

This is the main question that will be answered through this article! There are many methods to perform hyperparameter tuning. In this article, we will learn 6 of them, along with the comparison and guide on when to use each of them.

Image by Author

Grid Search

Image by Author.

The first and most popular method is called Grid Search. Basically, this is simply brute force where we have to test all of the possible combinations.

This method is suited best when you already know the suitable hyperparameter space for your case.

Pros:

  • Able to test all combinations within the hyperparameter space
  • Super simple to implement

Cons:

  • Curse of dimensionality
  • Possible to miss the better hyperparameter combination outside the hyperparameter space

Random Search

Image by Author.

I would say this is the second most popular hyperparameter tuning method after Grid Search. Random search works by randomly selecting the combination of hyperparameters within the hyperparameter space.

This method is suited best when you do not know the suitable hyperparameter space (most often this is the case).

Usually, when performing random search we set the hyperparameter space bigger than when we performing grid search. Why? So hopefully we can get a better combination of hyperparameters.

Pros:

  • Great for discovery and getting hyperparameter combinations that you would not have guessed intuitively

Cons:

  • Often requires more time to execute until getting the best combinations

Coarse to Fine Search

Image by Author.

The underdog method. Coarse to fine search is basically just the combination of grid search and random search but turns out it is incredibly powerful.

This method works like this:

  1. Perform Random Search on the initial hyperparameter space
  2. Find promising area
  3. Perform Grid/Random search in the smaller area
  4. Continue until optimal score obtained or maximum iterations reached

Pros:

  • Utilizes the advantages of Grid and Random Search
  • Spending more time on search spaces that are giving the good result

Cons:

  • Harder to implement since no package supporting this feature yet

Worry no more, because you will get the code implementation of this underdog method from this article!

Bayesian Search

Conditional Probability Formula. Image by Author.

Bayesian Search is a “clever” algorithm that utilizes the Bayes Theorem to search the best combination of hyperparameters.

At a high level this method works like this:

  • Start with a prior estimate of parameter distributions
  • Maintain a probabilistic model of the relationship between hyperparameter values and model performance
  • Alternate between:
  1. Training with hyperparameter values that maximize the expected improvement
  2. Use training results to update the initial probabilistic model and its expectations

Pros:

  • An efficient way to choose hyperparameters with no human intervention

Cons:

  • Difficult to implement from scratch

Read this article for more details about Bayesian Search.

The code implementation for this method can be found in the Github Repo attached at the end of this article.

Genetic Algorithm

Photo by National Cancer Institute on Unsplash

This is an even more “clever” way to do hyperparameter tuning. This method is inspired by the evolution by natural selection concept.

At a high level, the Genetic Algorithm works like this:

  • Start with a population
  • For each iteration, the population will “evolve” by performing selection, crossover, and mutation
  • Continue until maximum iterations reached

Watch this video to learn more about the Genetic Algorithm.

The code implementation for this method can be found in the Github Repo attached at the end of this article.

Manual Search

Photo by Cat Han on Unsplash

As implied by the name, this is a manual approach where the person itself is manually tweaking the hyperparameter combinations until the model gets the optimal performance.

You have to really understand how the algorithm works when you want to go with this approach. Basically, this method works like this:

  • Train & evaluate the model
  • Guess a better hyperparameter combination
  • Re-train & re-evaluate the model
  • Continue until optimal score obtained

Pros:

  • For a skilled practitioner, this can help to reduce computational time

Cons:

  • Hard to guess even though you really understand the algorithm
  • Time-consuming

When to Use Each of the Methods?

Sometimes. the more we know the confused we become.

We now know 6 methods we can utilize when performing hyperparameter tuning. Is it enough? Maybe for someone who can easily grasp a new concept, it is enough to read the explanation above and is ready to implement them in the real scenario.

However, for those of you who are still confused about when to use each of the methods, here I also created a simple matrix that you can refer to.

Image by Author.

This matrix can help you to decide which method should be used based on the training time and the size of hyperparameter space.

For instance, when you are training a deep neural network with big hyperparameter space, it is preferable to use a Manual Search or Random Search method rather than using the Grid Search method.

You can find all of the codes used in this article here.

Final Words

Photo by Quino Al on Unsplash

Congratulations for keeping up to this point! Hopefully, you have learned something new from this article.

After reading this article, you should have known various methods to perform Hyperparameter tuning and when to use each of them. If you love the content, please follow my Medium account to get a notification about my future posts!

About the Author

Louis Owen is a Data Science enthusiast who is always hungry for new knowledge. He pursued a Mathematics major at one of the top universities in Indonesia, Institut Teknologi Bandung, under the full final-year scholarship. Recently, in July 2020, he was just graduated from his study with honors.

Currently, Louis is an AI Research Engineer at Bukalapak, where he helps to deliver various AI solutions (Financial Time-Series, Natural Language Processing, and Computer Vision).

Check out Louis’ website to know more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.

--

--

NLP Engineer at Yellow.ai | Former Data Science Consultant at The World Bank & AI Research Engineer at Bukalapak