
The first digit law, first digit phenomena, or leading digit phenomenon is a phenomenological law. According to Benford’s law, the leading digit 1 occurs more than other leading digits with a frequency of 30% in most of worldwide censuses. It is almost as if we had a meticulously rigged dice that always favors 1 over 2, 2 over 3, and so on. This is obviously different than the one digit out of 9 probability we intuitively would think of when we estimate the proportion in which every digit from 1 to 9 would appear.

Benford’s law has long been regarded as a fascinating and enigmatic natural law.
Its explanations range from supernatural to measure-theoretical. Use cases relating to it span from fraud detection to computer disk space allocation.
Publications on the subject have increased in recent years, with the majority of them focusing on the examination of the law from various Data sources with applications in fraud and computer science. However, the basic cause of Benford’s Law somehow still remains a mystery.
This article aims not to provide theoretical evidence to justify the origin of such a law. As you may have noticed, it is a bit tricky to approximate from a probabilistic point of view.
No matter what you use as a base distribution and even if you take the first digits from the samples, the distribution is far from approximating Benford’s.
In a first step, we put forward 3 probabilistic approaches to get a Benford-like distribution. In the second part, we deal with 2 more realistic cases where Benford’s distribution is more likely to appear in real life.
So let us get into practice.
Three ways to simulate Benford’s Law:
1. Raising a uniform sample to the power of N
The first way to do it is to sample values from a uniform distribution between 0 and 1 and raise them to the power of n
, n
being a relatively large integer:

Here’s an example of how we would do this for a single sampled value :

We write down our sampling function which takes the boundaries of the distribution, the size of the sample, and the power to which each element of the sample is raised, all as parameters.
We write down an extract_first_digit
as a function that takes the first leftmost digit of each number of the sample :
Let us put our first simulation into ignition :




The higher the exponent, the more the distribution looks like Benford’s law. You may try and replace the uniform distribution with another one by yourself. I am not quite sure you would find the same results at the end, as interesting as it might seem.
2. Dividing a uniform distribution by another one:

The second way to do it is to divide a sample from a uniform distribution between 0 and 1 by another one with the same parameters:

Let us recreate that :
A quick plot :
sns.histplot(history, stat="probability", discrete = True, color = 'b')

Unfortunately, just like the first simulation trick, you will not be able to have the same Benford-like distribution had you not used uniform distributions.
3. Construction of a Markov Chain :
The third way consists in constructing a Markov chain that is initiated with a random value sampled from a uniform distribution between 0 and 10 (10 not included preferably).
Then, each new state is formed by a new uniform distribution which is delimited by 0 and the value’s mantissa
sampled from the previous state.

In layman’s terms, a mantissa
is the first digit of a number and is represented between curly brackets. Here is a more concrete example :

Let us get into action and write some code :

Visibly enough, the digits’ different probabilities differ somehow from the official Benford’s probability values. The shape remains the same altogether: 1 is prone to be more frequent than other digits
.
Two real cases where Benford’s law might appear:
Now that we have finished warming up, we will tackle two concrete use cases where Benford’s Law works its magic.
1. Price distributions in supermarkets
The first case involves prices in supermarkets, where it was found that, on the whole, the digit 1 appears more frequently to the left of the prices, than the digit 2 which itself appears more frequently than the the digit 3, and so on.
Our friend Alice comes in again to take us on a tour to a supermarket next to where she lives. She is used to running some errands there as she finds whatever she needs. Once arrived, she browses through the aisles and walks next to a buffet where some articles are offered with discounts.

Little does she know all encountered prices come from different distributions that give birth to a very special probabilistic law. Each distribution is considered a range of prices for variants of a single brand.

Let us help Alice trace it back.
First, let us define custom functions that draw sample distributions from which values are sampled.
For instance, we define a range of distributions for the normal distribution that are created through the range of means and variances we specify as entry parameters. In a future step, we would sample a mean and variance, we would create a gaussian instance and sample a value from it.
We write the same mechanism to recreate gamma and uniform distributions.
In the next step we run a series of sampling operations, each consisting of a random choice of either a uniform, gaussian or gamma distribution, take a sample from it, collect the first digit and store it in a list.
Once again, we are in front of a Benford distribution.

In 1998, Theodore P.Hill
gave a rigorous demonstration that a sample taken from a mix of distributions follows Benford’s law. So this comes as no surprise.
2. Multiplicative fluctuation of a stock price
Let us look closely into the evolution of a stock price.
Considering it is multiplied each time by a random sample drawn from a different normal distribution,

we will be tracing each first digit of every new price and see if something arises from it.

Here’s a quick demo :
Once we have read through all the price records and stored all the digits, we plot the distribution ( keep your fingers crossed .. ):

Benford’s law appears in front of us once again.
In 2001, L.Pietronero
, E. Tosatti
, V. Tosatti,
and A. Vespignani
tackled this problem in their paper entitled Explaining the uneven distribution of numbers in nature
in which the authors start with the study of multiplicative processes and make the analogy with the central limit theorem which, instead of dealing with multiplication, sums up random processes. They claim :
This exercise shows that the numbers N characterizing some physical quantities or objects naturally will follow Benford’s law if their time evolution is ruled by multiplicative fluctuations.
Closing thoughts :
Many mathematicians have succeeded in explaining the natural appearance of Benford’s law in common numbers.
Until today, the subject raises everyone’s curiosity, as thorough as the explanations may be. From a personal point of view, it is always nice to check the specialist’s findings with small and fun simulations that make the absorption of such concepts easier.