The world’s leading publication for data science, AI, and ML professionals.

Benford’s law in the Gaia universe

One day Simon Newcomb (1835-1909) was looking at the pages of logarithmic tables when he saw that the first pages were more worn out than…

Benford’s law and distances to stars

Mysterious digit-law embedded in the universe?

One day, while examining the pages of logarithmic tables, Simon Newcomb (1835–1909) noticed that the early pages were more worn than those towards the end. This simple observation and the way logarithmic tables are structured, meant that numbers starting with digit 1 were more common in nature than numbers starting with digit 2 and digit 2 more common than digit 3 and so on. This strange pattern in the first-digit frequency might sound counterintuitive at first, but it turned out to be true for many numerical datasets.

To clarify what is meant by the first digit: for example, the numbers 1, 1213123, and 0.00153 all start with the digit 1, whereas 312, 0.3, and π all begin with the digit 3.

Newcomb published a paper about his discovery in 1881, but it wasn’t really picked up by the scientific community. So, when Frank Benford (1883–1948) rediscovered this phenomenon in 1938, he didn’t know about Newcomb’s discovery. Benford looked at the first digits of many different datasets, such as lengths of rivers, addresses, atomic weights, random newspaper numbers, and so on. Every time, he saw a similar pattern, which supported the idea that numbers could be sorted on their first digit. This digit-law is now called Benford’s law but could also have been called Newcomb’s law.

Math

Benford’s law can be caught into a probability function:

Where P is the probability for a first digit d to occur. The frequencies in percentages can also be displayed in a figure, such as in this bar chart:

Benford's law, with on the x-axis the digit and the y-axis the frequency of occurrence in percentages. Image by the author.
Benford’s law, with on the x-axis the digit and the y-axis the frequency of occurrence in percentages. Image by the author.

Benford’s law applied

What does it bring us today? Benford’s law can be used to detect (tax) fraud and it has, for example, been used to evaluate the reliability of Covid-19 data (number of cases and death).

If you would expect Benford’s law to occur but your first-digit distribution is deviating too much, this is a strong hint to inspect your data in more detail. Hence, you could see Benford’s law as a simple quick validation tool for your numerical dataset. However, Benford’s law cannot blindly be followed as we will see…

Benford’s law and distances between stars

In the Netflix show Connected: The Hidden Science of Everything Benford’s law is explored in one of their episodes. They also tell that the distances to other stars and galaxies are following Benford’s law. This is probably based on a scientific paper that was using ESA’s Hipparcos space telescope data together with a bit of other observational data (see HYG database). With this data, you can estimate distances indirectly by converting the measured parallaxes. (Note that we can only determine parallaxes and not direct distances)

With ESA’s space telescope Gaia, we now have more precise measurements and a thousand times more parallaxes of stars in our own galaxy (the Milky Way). So, to verify earlier claims, we explored if distances to stars are really following Benford’s law and what it means. For this, we used Gaia‘s second data release.

You can find our full results in a scientific paper on Astronomy & Astrophysics. Here you can also read in the second section when Benford’s law is in general expected to occur (if you might have wondered by now).

The most important result of our research is that the parallaxes are following Benford’s law as we saw before with the Hipparcos or HYG data, but when we looked at the statistically derived distances, we came up with a different conclusion for the distances. These do not seem to follow Benford’s law.

Gaia parallaxes with the digit number on the x-axis, the probability on the y-axis for the Gaia parallax data in red squares, and Benford's law displayed with black bars. The parallax first digit pattern is very similar to Benford's law. Image by the author.
Gaia parallaxes with the digit number on the x-axis, the probability on the y-axis for the Gaia parallax data in red squares, and Benford’s law displayed with black bars. The parallax first digit pattern is very similar to Benford’s law. Image by the author.
Derived Gaia distances with the digit number on the x-axis, the probability on the y-axis for the derived Gaia distance data in red squares, and Benford's law displayed with black bars. The first digit pattern of the derived distances is deviating much from Benford's law. Image by the author.
Derived Gaia distances with the digit number on the x-axis, the probability on the y-axis for the derived Gaia distance data in red squares, and Benford’s law displayed with black bars. The first digit pattern of the derived distances is deviating much from Benford’s law. Image by the author.

As you can see in the figures above, it is clear for the distances that the values starting with digit 2 are most common. We also saw that precise simulations of the Milky Way show a non-Benford distribution of the first-digits. This result contradicts that distances to other stars in the Milky Way are following Benford’s law, as claimed before.

We explored why this could be the case. And it turned out that when you increase the uncertainty on the parallax measurements, the first digit frequencies are agreeing better with Benford’s law. Hence, it might be the uncertainty in the measurements which is causing Benford’s law to occur in the parallax data. However, final conclusions about the nature of distances to other stars cannot be made.

An important takeaway is that you cannot easily assume that everything in nature and our universe is simply following Benford’s law.

Data science

For anyone interested in applying Benford’s law on their own data in Python, you can use my package to inspect if your own numerical data follows Benford’s law.

SourceJurjen de Jong, Jos de Bruijne, Joris De Ridder (2020). "Benford’s law in the Gaia universe", A&A 642 A205, DOI: 10.1051/0004–6361/201937256


Related Articles