
Generating a first model for a given data science problem can be pretty straightforward. Building an efficient model, offering a high level of precision, is much harder. Data Scientists have to clean data, refine features, find the right metrics, find the right validation strategy, build test and training sets correctly, and fine-tune the chosen model’s parameters.
Most of these steps benefit from the experience of the data scientist, and can hardly be automated. Fortunately, this is not the case for hyperparameter tuning, which can use Automated Machine Learning: Automl.
In this post, we will explain mathematically why Hyper Parameter tuning is a complex task and show how SMAC can help to build better models.
Hyperparameters? What are you talking about?
When trying to fit a model for a given dataset, two kinds of parameters must be defined:
- The parameters used to configure the model: the depth of decision tree, the kernel of an SVM model, the degree of a polynomial, the number of layers of a neural network, …
- The parameters of the model itself: the weight attached to the leaves of a decision tree, the parameters of an SVM kernel, the coefficients attached to a polynomial, the weight of each neuron, …
The first kind of parameters is referred to as model Hyper Parameters, as they defined the structure of the model, whereas the latter are attached to the model itself.
Why do you need AutoML for Hyper Parameter Tuning?
Let’s remember that Hyper Parameter Tuning, sometimes referred to as Hyper Parameter Optimisation (HPO), is the process that tries to make the most of your model by identifying the best configuration for your model with respect to the metric that you have chosen.
There are mainly two reasons for using AutoML:
- Finding the best parameter by hand is a pretty tedious task. The combinatorics of configuration space can be quite large. For Random Forest, there are no fewer than 10 parameters to consider. Each parameter can take 10 different values. Hence exploring the configuration space requires evaluating 10¹⁰ configurations!
- There is absolutely no guarantee that the configuration optimized for a given dataset will perform as well on another dataset. Each time you apply your model to a new dataset, it’s crucial to refine hyperparameters.
Using AutoML technologies allows tackling these two limitations by automating configuration space exploration.
Hyper Parameter Tuning with AutoML: A difficult task
Before going further, and showing you how to efficiently autotune hyperparameters, let’s explain why this is a complex task.
Let’s put some math to formalize what Hyper Parameter optimization is.
Mathematically speaking, HPO tries to minimize an evaluation metric on one or many testing sets, generally generated using cross-validation. This can be formalised as:

where M is an evaluation of the metric for which we want to optimise the model f_hat, using given train and test sets. Theta is the set of parameters used for configuring the model. Values are taken from the configuration space Theta.
Usually, when facing such an optimisation problem, methods based on numerical differentiation are used. Basically, the function to optimise, i.e. the composition of M with f_hat, is differentiated with respect to parameters theta. Then, Newton-Raphson, gradient descent, or any similar method is used to iteratively converge toward an optimum.
However, in the case of Hyperparameter Tuning, it’s generally not possible to compute those gradients:
- The metrics can be non-smooth, like MAE for instance. See my other paper on the subject. Hence, differentiation is not possible.
- The model itself can be difficult to differentiate, not only symbolically but also numerically. Think of all the methods based on trees, that are piecewise constant.
- The gradient can vanish. This is the case for neural networks but also tree-based methods: Random Forest, XGBoost, CatBoost, LightGBM, …
- Numerically evaluating gradients is highly time-consuming, as for each variation in each parameter direction we need to train a full model.
- Model hyperparameters may not be continuous. Think of XGBoost’s _nbestimators parameters: it’s an integer value. The same stands for categorical parameters.
Note that Automatic Differentiation can help in some cases, but you cannot perform optimisation using gradient directed methods in most case.
As you can see, many reasons forbid the use of standard Optimization methods in the case of Hyper Parameters Tuning. Are we forced to rely only on our experience as Data scientists to pick the best parameters?
Why not use Grid Search?
One option could be to use brute force. After all, finding the best hyperparameters for XGBoost, Random Forest, or any other model simply requires evaluating your metrics for each possible configuration.
But as stated above, the configurations space can be huge, and even though computers are more and more powerful, exploring 10¹⁰ configuration is still (far) out of their reach.
So this can only be an option when your configuration space is very limited.
And what about Random Search
This is an option. But there is random in it 😉 We’ll see below that it works, but there is no guarantee to converge to the best configuration, at least in a given amount of time.
Using SMAC
An alternative to brute force and random exploration is proposed by the library SMAC: Sequential Model-based Algorithm Configuration.
The idea behind this library is to build a model (Hence the model-based in the SMAC acronym) that tries to estimate metric value for a given set of hyperparameters. Using this internal model, you can generate configurations randomly and use this estimator to guess what would be the best one.
See this very nice seminal paper for more detail on the subject. I’ve also written a complete article on how to create your own HPO engine using a model:
Tuning XGBoost with XGBoost: Writing your own Hyper Parameters Optimization engine
SMAC uses a Random Forest model to capture the behaviour of the algorithm/model to optimize with respect to the metric.
The overall algorithm is pretty simple. The first training is run and the metric is calculated. The Random Forest model is trained with this first output.
Random configurations are then generated and the one with the best-estimated score is used for the next training. The model is then retrained with this new outcome, and the process starts again.
The configuration space exploration is guided by the internal performance model. As demonstrated in the code, inspired by this SMAC example, SMAC learns from the previous runs and improves its knowledge at each step:
The code above uses SMAC and RandomizedSearchCV to tune Hyper Parameter. Please note that SMAC supports continuous real parameters as well as categorical ones. Supporting categorical parameters was one reason for using Random Forest as an internal model for guiding the exploration.
This code illustrates the use of SMAC with Random Forest as the model to fine-tune, but I’ve been using it for XGBoost as well as SVM or SARIMA models, and it works like a charm.
The graphs below compare the configuration space exploration in both cases, i.e using SMAC and random search. They plot score with respect to the number of Hyper Parameter Tuning iterations:


On the left side, we see that the Random Search explores the configuration space erratically. The optimization does not benefit from previous trainings. On the other hand, on the right plot, it clearly appears that SMAC learns from previous runs, and tries configurations that are good candidates for improvement.
The result is that SMAC converges more quickly to a better solution.
Conclusion
Finding optimal Hyper Parameters for a model is tedious but crucial task. SMAC is a very efficient library that brings Auto ML and really accelerates the building of accurate models. Any kind of model can benefit from this fine-tuning: XGBoost, Random Forest, SVM, SARIMA, …
It is very interesting to see that ML-based methods can be used to help better train ML models. This however raises a question: How are the Hyper Parameters of the internal Random Forest model of SMAC optimized?
You’ll find the answer in the SMAC code or the academic paper: they are hardcoded 😉