Quick Tutorial: Using Bayesian optimization to tune your hyperparameters in PyTorch

A faster way to design your neural networks

Michael Deyzel
Towards Data Science

--

Hyperparameter tuning is like tuning a guitar, in that I can’t do it myself and would much rather use an app. Photo by Adi Goldstein on Unsplash

Hyperparameters are the parameters in models that determine model architecture, learning speed and scope, and regularization.

The search for optimal hyperparameters requires some expertise and patience, and you’ll often find people using exhausting methods like grid search and random search to find the hyperparameters that work best for their problem.

A quick tutorial

I’m going to show you how to implement Bayesian optimization to automatically find the optimal hyperparameter set for your neural network in PyTorch using Ax.

We’ll be building a simple CIFAR-10 classifier using transfer learning. Most of this code is from the official PyTorch beginner tutorial for a CIFAR-10 classifier.

I won’t be going into the details of Bayesian optimization, but you can study the algorithm on the Ax website, read the original paper or the 2012 paper on its practical use.

Firstly, the usual

Install Ax using:

pip install ax-platform

Import all the necessary libraries:

Download the datasets and construct the data loaders (I would advise adjusting the training batch size to 32 or 64 later):

Let’s take a look at the CIFAR-10 dataset by creating some helper functions:

Training and evaluation functions

Ax requires a function that returns a trained model, and another that evaluates a model and returns a performance metric like accuracy or F1 score. We’re only building the training function here and using Ax’s ownevaluate tutorial function to test our model performance, which returns accuracy. You can check out ther API to model your own evaluation function after theirs, if you’d like.

Next, we’re writing an init_net() function that initializes the model and returns the network ready-to-train. There are many opportunities for hyperparameter tuning here. You’ll notice the parameterization argument, which is a dictionary containing the hyperparameters.

Lastly, we need a train_evaluate() function that the Bayesian optimizer calls on every run. The optimizer generates a new set of hyperparameters in parameterization, passes it to this function, and then analyzes the returned evaluation results.

Optimize!

Now, just specify the hyperparameters you want to sweep across and pass that to Ax’s optimize() function:

That sure took a while, but it’s nothing compared to doing a naive grid search for all 3 hyperparameters. Let’s take a look at the results:

results[INFO 09-23 09:30:44] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+GPEI', steps=[Sobol for 5 arms, GPEI for subsequent arms], generated 0 arm(s) so far). Iterations after 5 will take longer to generate due to model-fitting.
[INFO 09-23 09:30:44] ax.service.managed_loop: Started full optimization with 20 steps.
[INFO 09-23 09:30:44] ax.service.managed_loop: Running optimization trial 1...
[INFO 09-23 09:31:55] ax.service.managed_loop: Running optimization trial 2...
[INFO 09-23 09:32:56] ax.service.managed_loop: Running optimization trial 3...
...[INFO 09-23 09:52:19] ax.service.managed_loop: Running optimization trial 18...
[INFO 09-23 09:53:20] ax.service.managed_loop: Running optimization trial 19...
[INFO 09-23 09:54:23] ax.service.managed_loop: Running optimization trial 20...
{'lr': 0.000237872310800664, 'batchsize': 117, 'momentum': 0.99}
{'accuracy': 0.4912998109307719}
{'accuracy': {'accuracy': 2.2924975426156455e-09}}

It seems our optimal learning rate is 2.37e-4 when comined with a momentum of 0.99 and a batch size of 117. That’s pretty nice. The 49.1% accuracy you see here is not the final accuracy of the model, so don’t worry!

We can go even further and render some plots that show the accuracy per epoch (which improved as the parameterization improved), and the estimated accuracy by the optimizer as a function of two hyperparameters using a contour plot. The experiment variable is of type Experiment and you should definitely check out the docs to see all the methods it has to offer.

The rendered plots are easy to understand and interactive. The black squares in the contour plots show the coordinates that have actually been sampled.

Lastly, you can fetch the parameter set (something Ax calls an “arm”) that has the best mean accuracy by simply running the script below:

Arm(name=’19_0', parameters={‘lr’: 0.00023787231080066353, ‘batchsize’: 117, ‘momentum’: 0.9914986635285268})

Don’t be afraid to tune anything you wish, like hidden layer number and size, dropout, activation functions, depth of unfreezing, etc.

Happy optimizing!

--

--

Electronic Engineer based in Stellenbosch University, South Africa. Web | Apps | ML | Computer Vision.