The world’s leading publication for data science, AI, and ML professionals.

Parameters and Hyperparameters in Machine Learning and Deep Learning

What exactly are they and how do they interact?

When you begin learning anything new one of the things you grapple with is the lingo of the field you’re getting into. Clearly understanding the terms (and in some cases the symbols and acronyms) used in a field is the first and most fundamental step to understanding the subject matter itself. When I started out in Machine Learning, the concept of parameters and hyperparameters confused me a lot. If you are here I am supposing you also find it confusing. So, I wrote this article to dispel whatever confusion you might have and set you on a path of absolute clarity.

In ML/DL, a model is defined or represented by the model parameters. However, the process of training a model involves choosing the optimal hyperparameters that the learning algorithm will use to learn the optimal parameters that correctly map the input features (independent variables) to the labels or targets (dependent variable) such that you achieve some form of intelligence.

So what exactly are parameters and hyperparameters and how do they relate?

Hyperparameters

Hyperparameters are parameters whose values control the learning process and determine the values of model parameters that a learning algorithm ends up learning. The prefix ‘hyper_’ suggests that they are ‘top-level’ parameters that control the learning process and the model parameters that result from it.

As a machine learning engineer designing a model, you choose and set hyperparameter values that your learning algorithm will use before the training of the model even begins. In this light, hyperparameters are said to be external to the model because the model cannot change its values during learning/training.

Hyperparameters are used by the learning algorithm when it is learning but they are not part of the resulting model. At the end of the learning process, we have the trained model parameters which effectively is what we refer to as the model. The hyperparameters that were used during training are not part of this model. We cannot for instance know what hyperparameter values were used to train a model from the model itself, we only know the model parameters that were learned.

Basically, anything in machine learning and Deep Learning that you decide their values or choose their configuration before training begins and whose values or configuration will remain the same when training ends is a hyperparameter.

Here are some common examples

  • Train-test split ratio
  • Learning rate in optimization algorithms (e.g. gradient descent)
  • Choice of optimization algorithm (e.g., gradient descent, stochastic gradient descent, or Adam optimizer)
  • Choice of activation function in a neural network (nn) layer (e.g. Sigmoid, ReLU, Tanh)
  • The choice of cost or loss function the model will use
  • Number of hidden layers in a nn
  • Number of activation units in each layer
  • The drop-out rate in nn (dropout probability)
  • Number of iterations (epochs) in training a nn
  • Number of clusters in a clustering task
  • Kernel or filter size in convolutional layers
  • Pooling size
  • Batch size

Parameters

Parameters on the other hand are internal to the model. That is, they are learned or estimated purely from the data during training as the algorithm used tries to learn the mapping between the input features and the labels or targets.

Model training typically starts with parameters being initialized to some values (random values or set to zeros). As training/learning progresses the initial values are updated using an optimization algorithm (e.g. gradient descent). The learning algorithm is continuously updating the parameter values as learning progress but hyperparameter values set by the model designer remain unchanged.

At the end of the learning process, model parameters are what constitute the model itself.

Examples of parameters

  • The coefficients (or weights) of linear and logistic regression models.
  • Weights and biases of a nn
  • The cluster centroids in clustering

Simply put, parameters in machine learning and deep learning are the values your learning algorithm can change independently as it learns and these values are affected by the choice of Hyperparameters you provide. So you set the hyperparameters before training begins and the learning algorithm uses them to learn the parameters. Behind the training scene, parameters are continuously being updated and the final ones at the end of the training constitute your model.

Therefore, setting the right hyperparameter values is very important because it directly impacts the performance of the model that will result from them being used during model training. The process of choosing the best hyperparameters for your model is called hyperparameter tuning and in the next article, we will explore a systematic way of doing hyperparameter tuning.

Conclusion

I trust that you now have a clear understanding of what hyperparameters and parameters exactly are and understand that hyperparameters have an impact on the parameters your model learns. I will be following this up with a detailed practical article on hyperparameter tuning.

This article is a product of knowledge from

  • The deep learning specialization on Coursera by Andrew Ng.
  • Machine Learning course on Coursera by Andrew Ng.

If you liked this article, please follow me


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.