Artificial Intelligence: Hyperparameters

Daniel Shapiro, PhD
Towards Data Science
4 min readOct 11, 2017

--

Deep learning neural network models have lots of parameters (e.g. weights and biases), and also quite a few hyperparameters. We know what parameters are from high school. They are numbers you plug into a function. But what are hyperparameters? Well, they are basically options used to create the model that holds the parameters.

It takes some time to learn which hyperparameter setting is appropriate for which model behavior. Thankfully the keras defaults are a really good starting point.

First, let’s remember from previous articles that our deep learning models are trying to approximate a function f that maps input features X to output decisions Y. Or, put another way, we are trying to find a function that fits Y= f(X) with a low error, but without memorizing the training data (overfitting).

Parameters in our model like weights and biases are adjusted during the parameter optimization process (i.e. backpropagation) to get a better and better version of our mapping function f. Based on super exciting recent work, it turns out that neural networks are in fact doing a sort of information compression as they learn to approximate f. This ties our old friend information theory into our new friend deep neural networks.

Hyperparameters are the meta-settings that we can pick (hopefully in some smart way) to tune how the model forms f. Put another way, we set hyperparameters in order to pick the type of model we want. For example, t-SNE has hyperparameter settings called perplexity, epsilon (learning rate), and a few others, like the number of iterations.

Think of configuring and building deep learning models like ordering sushi: Sushi comes in rolls. Some rolls have 8 pieces, and others have 6 or 4 pieces. The hyperparameter that controls the taste, and how many pieces you will get when you ask for a roll from the menu, is the roll type. You can opt for a spicy roll, a veggie roll, a fried roll, and so on. In all cases you still get sushi. What changes is the configuration the sushi chef uses to make your sushi. Each roll tastes different. Bringing this back to the machine learning world, we need to make some very big decisions when we pick hyperparameters (e.g. regressor or classifier, CNN or LSTM or DNN or GaN) and also a lot of small decisions (e.g. batch size, test/train split, regularizers, dropout, noise, etc). In some cases a pre-trained neural network (a la VGG-19) or a predefined neural network shape (a la autoencoder) will bring you way closer to the solution than starting from scratch. For fully custom neural network configurations, we get lots of cool hyperparameter options in keras like regularizers for L1 and L2, DNN layer width, network shape (autoencoder, fixed width, …), learning rate, and a LOT more.

As you go through the design space exploration, you find that many of the possible hyperparameter settings are very useless.

Programmers like to use configuration parameters e.g. prod / dev settings. We use ConfigParser for this. However, hyperparameters in deep learning are more akin to a series of nested for loops, searching for a “good” configuration, before breaking out. The search has to scan through the available machine learning models to find one with low error (or whatever the objective function is). You can think of these model hyperparameters as configurations. However, it it is more accurate to think of the hyperparameter selection search as a Pareto optimization where constraints are things like the size of the GPU, and objectives are loss/accuracy, generality (precision, recall, F-score), and other model performance criteria. Having lots of model constraints is no problem. It is the part where you have multiple objectives, and some constraints are integer, that really blows. When faced with multiple objectives in an optimization problem, you need to either create a linear combination of these objectives (linear model), possibly do some crazy math (see mixed integer linear programming), or just cast this as a meta-level machine learning problem (research!). Because multi-objective Pareto stuff is so ugly and slow (read as expen$ive), the rule is basically to try stuff that makes sense until you reach an acceptable model performance level. My masters degree was on design space exploration, so I know firsthand how tough it is to pick a given configuration under multiple constraints.

Before I sign off, I got some really interesting news today. API.AI, has changed its name to Dialogflow. They redirected the domain name and everything. I think at some point Google is going to set it up as dialogflow.google.com, as they did with AdWords and other google products like inbox. Alternatively, it may get swallowed into Google Cloud Platform, like Amazon does with their AWS cloud services.

OK. Back to work! If you enjoyed this article on artificial intelligence, then please try out the clap tool. Tap that. Follow us on medium. Go for it. I’m also happy to hear your feedback in the comments. What do you think? Do I use too many parentheses? What should I be writing about? I wrote a bunch of articles on the business side, and recently the interest is more on the technical side. How about this: send me your data science use cases or problems, and I’ll pick an entry to write an article about. Go for it: daniel@lsci.io

Happy Coding!

-Daniel
daniel@lemay.ai ← Say hi.
Lemay.ai
1(855)LEMAY-AI

Other articles you may enjoy:

--

--