
If you have used any form of grid search to tune the hyperparameters of your models, chances are you have encountered the usage of log-uniform distribution.
In the most basic scenario of the grid search, we define the possible values of the hyperparameters as a list. However, when using randomized search or more advanced Bayesian methods, we can also define the search space using statistical distributions.
Have you ever wondered why the range of some hyperparameters is defined using a uniform distribution, while others use the log-uniform distribution? If so, this article will answer that questions.
Uniform and log-uniform distribution
Before we move on, let’s do a very quick recap of what those distributions actually represent.
The uniform distribution assumes equal probability for all values within a certain range. It is common to denote the lower and upper thresholds of the range with a and b. Values outside of that range simply cannot happen. The probability density function (PDF) of the distribution is as follows:

In the log-uniform distribution, points are sampled uniformly between log(a) and log(b), where log is most frequently the logarithm with base 10.
Theoretical answer
The tl;dr answer to the titular question is that log-uniform distribution is very useful for exploring the values that vary over several orders of magnitude.
Having said that, it might be easier to understand that statement using a specific example. Let’s imagine we are tuning a Lasso model (does not matter if regression or classification). It has a hyperparameter alpha
, which determines the strength of regularization – the larger the value, the stronger the regularization. We know that the value should be ≥ 0 and does not have an upper limit. Additionally, using alpha
= 0 makes the model equal to vanilla logistic regression, however, due to numerical reasons that value is not recommended to use in scikit-learn
.
With the information above, we know the constraints on the values of alpha. But we do not know approximately which values would work well for the model and our data. That is why we would like to efficiently explore a very wide range of possible values. For our problem, let’s assume that we explore the values in the range of 0.0001 and 100.
The main benefit of using the log scale and the log-uniform distribution is that it allows us to create an evenly distributed search space over several orders of magnitude. That is because thanks to the log scale, there are as many values from 1 to 10 as there are from 10 to 100. Let’s allow this to sink in and make sure that it becomes clear. To reiterate:
- In a linear space, it is quite obvious that there would be more possible values in the second range, simply because this range represents a larger distance in linear space. The latter range is 9 times the size of the former one.
- In logarithmic space, each of those two ranges is of the same width, as they are both subsequent exponentiations of 10.
So how to use the log-uniform distribution in practice? We start without knowing which values of the hyperparameter result in a good fit. We run a search using a wide space over several orders of magnitude. This way, we figure out which values work. Then, we can repeat the process, this time, narrowing down the range of possible values to the area around the values that resulted in a good fit.
Hand-on coding example
In this section, we will have a quick look at the two distributions to solidify our understanding of how they are connected. First, we import the libraries.
Then, let’s define the range for the distribution. As in the example above, we use 0.0001 and 100 as the lower and upper boundaries. Additionally, we specify the number of randomly generated values we would like to get from each of the distributions.
In the next step, we generate random values from the uniform and log-uniform distributions.
We used the numpy
implementation of the uniform distribution, as it is easier to work with than the one in scipy
, which requires passing in the loc
and scale
parameters instead of the lower and upper bounds. You can see an example of the scipy
implementation in the Notebook (link at the end) Then, we plot histograms of the randomly generated values to see the shape of the distributions.
Immediately we can see the characteristic shapes of the distributions. In the first case, each value between 0.0001 and 100 is equally likely. In the second, the values between 10 and 100 correspond to approximately 1/6 of all the generated values, following the logic of the log scale and the difference in magnitude between the boundary values.

We also dive deeper into the relationship between the uniform and log-uniform distributions. We have already mentioned that log-uniform distribution samples uniformly between the logs of the boundary values. That is exactly what we do in the following snippet. Then, we put those values back on the linear scale by raising 10 to the power of the randomly generated values.
In the following plot, we can see that the values are uniformly distributed on the log scale (log of 0.0001 is -4, while log of 100 is 2) and log-uniformly distributed on the linear scale.

Lastly, we can also verify that the log-uniform distribution generates an evenly spaced search space.
Running the code produces the following summary:
Number of values between 0.0001 and 0.001: 1659
Number of values between 0.001 and 0.01: 1620
Number of values between 10 and 100: 1614
The number of observations in each bin is approximately the same and will converge to the same number as we increase the number of randomly generated values. Thus, we have confirmed that the search space is evenly distributed.
One thing to keep in mind is that we have not set the seed, which means that those numbers will be slightly different every time we run the code.
Takeaways
- in the log-uniform distribution, points are sampled uniformly between log(a) and log(b),
- the log-uniform distribution is useful for exploring the values that vary over several orders of magnitude,
- we can use this distribution for the initial sweep over a large range of values and then narrow down the range after we determine which values work better for our model and data.
You can find the code used for this article on my GitHub. Also, any constructive feedback is welcome. You can reach out to me on Twitter or in the comments.
Liked the article? Become a Medium member to continue learning by reading without limits. If you use this link to become a member, you will support me at no extra cost to you. Thanks in advance and see you around!
You might also be interested in one of the following:
8 More Useful Pandas Functionalities For Your Analyses
pur – the easiest way to keep your requirements file up to date