The world’s leading publication for data science, AI, and ML professionals.

How to Tune Hyperparameters for Machine Learning

Tuning hyperparameters is a crucial part in the machine learning process. Read on to learn some effective and intuitive methods.

Photo by Nick Hillier on Unsplash
Photo by Nick Hillier on Unsplash

In machine learning algorithms, there are two kinds of parameters – model parameters and hyperparameters. Model parameters are learned through the training process, for example the weights of a neural network. Hyperparameters are used to control to training process; consequently, they must be set before training begins. Some examples of hyperparameters in deep learning are learning rate and batch size. One problem overlooked by many machine learning practitioners is exactly how to set these hyperparameters. Failure to do good Hyperparameter Tuning might nullify all your hard work building the model. Fortunately, there is a general, heuristic, approach for picking hyperparameters. For more complicated situations, there are also automatic hyperparameter selection methods. In this article we will discuss both.

A General Approach For Picking Hyperparameters

A good way to think about hyperparameter selection is to think in the context of model capacity. By model capacity we mean, in broad terms, the number of functions our model can represent. Ideally, we want to select hyperparameters so that our model’s capacity is just right for the problem at hand. In other words, we want hyperparameters that avoid underfitting or overfitting. It is easy to see this visually. If you were to make a plot with the test error on the y-axis and one hyperparameter’s value on the x-axis, most of the time you would get something like this:

Image by Author
Image by Author

A hyperparameter that is too small or too large corresponds to either a too large or too small model capacity, leading to high test error. Therefore, to set a hyperparameter optimally, we need to reason through what effect a particular value of that hyperparameter has on model capacity.

Let’s see an example of this reasoning in action. One common hyperparameter is the number of nodes in a neural network. The relationship between the number of nodes and model capacity is pretty obvious – more nodes means more model capacity, and vice versa. if we suspect our model is overfitting, for example if we notice a very low training error but a sizable test error, we know to that the model capacity is probably too large. Therefore, we need to decrease the number of nodes in the neural network. If we see underfitting – perhaps if the both the training and test set have high errors – we know the model capacity is too small, and that we should increase the number of nodes.

Now let’s consider another common hyperparameter – the weight decay coefficient. Weight decay is a form of regularization for Neural Networks, that, as the name suggests, makes the weights smaller. The effect of the weight decay coefficient is also pretty obvious – the higher the decay coefficient, the smaller the weights get, and the smaller the model capacity will be. Therefore, if we suspect overfitting, we ought to increase the decay coefficient, and if we suspect underfitting, we ought to decrease it.

However, there are some situations where it’s unclear how a hyperparameter affects model capacity. The learning rate in deep learning is one such example. In these situations, we can use automatic hyperparameter tuning methods.

Two Automatic Methods: Grid Search and Random Search

The rationale behind these automatic methods is straightforward. Let’s start with a simple case, where our model only has one hyperparameter. Let’s assume that we know the reasonable values for this parameter lie between 0 and 1 (for example, epsilon in reinforcement learning problems). We want to try some values to see which is the best. The most obvious thing to do is to try something like [0, 0.2, 0.4, 0.6, 0.8, 1] or [0, 0.33, 0.66, 1]. This idea of trying values at evenly spaced intervals is called grid search. If we have multiple hyperparameters, we would try every combination of the evenly-spaced individual parameters. For example, we could have two parameters and want to try the values [0, 1, 2] for each. Then our grid search would need to try (0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2). Grid search can also be run on a logarithmic instead of linear scale, for example [0, 10^(-3), 10^(-2), 10^(-1), 1].

One problem with grid search is that the number of combinations we need to try is exponential in the number of hyperparameters. Therefore, grid search will take a long time to run if we have more than a few hyperparameters. It’s also possible that not all the hyperparameters actually affect the model. If that’s the case, then not only does grid search take forever – it is also wasting lots of that time looking at variations in hyperparameters that don’t affect performance.

Random search solves both these problems. Random search works in two steps. First, we define a marginal distribution for each hyperparameter. Then we draw random values from the combined distribution and pick the best one. Random search avoids the exponential runtime problem – it explores the same space by virtue of its randomness. It also avoids the time wasting problem – because of its randomness, random search will not spend time on the same value of a particular hyperparameter (that might not affect the model at all).

Empirically, these improvements seem to give random search the advantage over gird search. Therefore, if you want to use an automatic method to select your hyperparameters, I suggest random search instead of grid search.


I hope this article has given you more insight into how to tune hyperparameters. Many hyperparameters can be tuned just by thinking about how that hyperparameter affects model capacity. Hyperparameters with a complicated effect on model capacity (e.g. learning rate), or large combinations of hyperparameters, can be tuned with automatic methods as mentioned above. My recommendation for these situations is random search.

Please feel free to leave any questions/comments. Thanks for reading!


Related Articles