Log-normal Distribution - A simple explanation

How to calculate μ & σ, the mode, mean, median & variance

Maja Pavlovic
Towards Data Science

--

About

We will briefly look at the definition of the log-normal and then go onto calculate the distribution’s parameters μ and σ from simple data. We will then have a look at how to calculate the mean, mode, median and variance from this probability distribution.

Informal Definition

The log-normal distribution is a right skewed continuous probability distribution, meaning it has a long tail towards the right. It is used for modelling various natural phenomena such as income distributions, the length of chess games or the time to repair a maintainable system and more.

Log-normal probability density function | image by author

The probability density function for the log-normal is defined by the two parameters μ and σ, where x > 0:

μ is the location parameter and σ the scale parameter of the distribution. Caution here! These two parameters should not be mistaken for the more familiar mean or standard deviation from a normal distribution. When our log-normal data is transformed using logarithms our μ can then be viewed as the mean (of the transformed data) and σ as the standard deviation (of the transformed data). But without these transformations μ and σ here are simply two parameters that define our log-normal, not the mean or standard deviation! Okay, now we went from “let’s keep it easy” to “a little too much information”. Let’s dial back and have a look at the just mentioned relationship between the log-normal and normal distribution a bit more.

The name of the “log-normal” distribution reveals that it relates to logarithms as well as the normal distribution. How? Let’s say your data fits a log-normal distribution. If you then take the logarithm of all your data points, the newly transformed points will now fit a normal distribution. This simply means that when you take the log of your log-normal data you end up with a normal distribution. See figure below.

Relationship between the normal and log-normal function | image by author, inspired by figure from Wikipedia

The data points for our log-normal distribution are given by the X variable. When we log-transform that X variable (Y=ln(X)) we get a Y variable which is normally distributed.

We can reverse this thinking and look at Y instead. If Y has a normal distribution and we take the exponential of Y (X=exp(Y)), then we get back to our X variable, which has a log-normal distribution. This visual is helpful to keep in mind when analysing important properties of the log-normal distribution:

“The most efficient way to analyse log-normally distributed data consists of applying the well-known methods based on the normal distribution to logarithmically transformed data and then to back-transform results if appropriate.” Lognormal wiki

Estimate μ & σ from data

We can estimate our log-normal parameters μ and σ using maximum likelihood estimation (MLE). This is a popular approach for approximating distribution parameters as it finds parameters that make our assumed probability distribution ‘most likely’ for our observed data.

If you want to understand how MLE works in more detail, StatQuest explains the approach in a fun intuitive way and also derives the estimators for the normal distribution.

The maximum likelihood estimators for the normal distribution are:

We, however, want the maximum likelihood estimators μ and σ for the log-normal distribution, which are:

These formulas are near identical. We can see that we can use the same approach as with the normal distribution and just transform our data with a logarithm first. If you are curious about how we get our log-normal estimators here is a link to the derivation.

Where is the simple example?!

Let’s take a look at 5 values of income that follow a log-normal distribution. Our fictitious person 1 earns 20k, person 2 earns 22k and so on:

We can now estimate μ with the logic from above. First, we take the log of each of our income data points and then calculate the average value for the 5 transformed data points, see below:

Table 1 | image by author

This gives us a value of 3.36 for our location parameter μ.

We can then use our estimated μ to approximate our σ with the following formula.

Rather than calculating σ², we take the square root of the formula above to approximate σ. The formula also uses n-1 instead of just n to get a less biased estimator. If you want to understand more on this change have a look at corrected sample variance (or also Bessel’s correction).

Table 2 | image by author

Similar to above, the first step is to take the logarithm of each individual income data point. We then subtract the estimated μ from each log-transformed data point and then square each result. See table above. These values are then inserted into the formula from above:

This gives us a value of 0.4376 for our scale parameter σ.

Note: These calculations are just an example of how these values can be obtained. You need more values to have any statistical significance.

Calculate median, mean, mode & variance

Extracting some of the important properties of the log-normal distribution is straightforward once we have our parameters μ and σ. See key properties, their formula, and the calculation for our example data in the table and figure below.

Table 3 | image by author

How do we arrive at the different formulas in the table above?

  • The median is derived by taking the log-normal cumulative distribution function, setting it to 0.5 and then solving this equation (see here).
  • The mode represents the global maximum of the distribution and can therefore be derived by taking the derivative of the log-normal probability density function and solving it for 0 (see here).
  • The mean (also known as the expected value) of the log-normal distribution is the probability-weighted average over all possible values (see here).
  • The variance of the log-normal distribution is the probability-weighted average of the squared deviation from the mean (see here).
Log-normal probability density function of simple data example | image by author

References

[1] Wikipedia, Log-Normal Distribution (2022), retrieved on 2022–02–06

[2] M. Taboga, “Log-normal distribution”, Lectures on probability theory and mathematical statistics (2021), Kindle Direct Publishing. Online appendix.

[3] A. Katz, C. Williams, and J. Khim, Brilliant: Log-normal Distribution (2022), retrieved on 2022–02–06

[4] J. Soch, K. Petrykowski, T. Faulkenberry, The Book of Statistical Proofs (2021), github.io

[5] Wikipedia, Bessel’s Correction (2022), retrieved on 2022–02–06

--

--

Google DeepMind PhD Scholar, simplifying Data Science and Deep Learning concepts || London (UK) ||