The world’s leading publication for data science, AI, and ML professionals.

How to Predict Something With No Data – and Bonsai Trees

In day to day life, we often have to make predictions with no data. Here are some ways to make better guesses.

Photo by Todd Trapani on Unsplash
Photo by Todd Trapani on Unsplash

Often in life, you’ll have to predict things with little or no data. Or simply you will know the distribution of a population and nothing more. For example, what is the probability that a Bonsai tree you’ve been gifted for Christmas will make it to the awkward family gathering?

In this article, I’ll discuss briefly some great insights from Statistics that will help you answer such questions. I’m not promising to predict the future, I’m simply going to show you the best techniques we have to do so, which often produce surprisingly good results despite our lack of data.

The following was inspired by a chapter in "Algorithms To Live By" by Brian Christian and Tom Griffiths.

Predicting lifetimes of Bonsai trees – the Copernican principle

You’ve been gifted a Bonsai tree from your strange Uncle, in what was clearly a panic-bought Christmas present. It didn’t even come with a care manual or an instruction booklet. They clearly know nothing about Bonsai trees. Neither do you, but you do know that they are notoriously famous for one thing – dying.

Therefore, like a good (and slightly sadistic) data scientist, your mind instantly moves towards trying to predict its death. The thing is, all you know is that this tree is 4 years old. You have no idea how long Bonsai trees can live for and you have no idea the reasons why they decide to give up the will to live. So, how on earth can you predict this?

In an ideal, big data world, you’d have a massive data set on millions of Bonsai trees and you’d have enough knowledge about Bonsai to be able to extract features from your own little, potentially, suicidal friend. You could then run some machine learning models and predict pretty well how long your little guy is going to live. But, you don’t have this luxury of plentiful data.

So what can you do?

Enter Copernicus and John Richard Gott III.

Richard Gott III, an Astrophysicist, first thought of his "Copernicus method" of lifetime estimation in 1969 when stood staring at the Berlin Wall and dwelling on how long it would last. Gott theorized that the Copernican principle is applicable in cases where nothing is known; unless there was something special about his visit (which he didn’t think there was) this gave a 50% chance that he was seeing the wall after the first half of its life and a 75% probability that he was seeing it after the first quarter.

Based on its age in 1969 (8 years), Gott left the wall with 50% confidence that it wouldn’t be there in 1993 (1969 + 8·(1.5/0.5)).

We can therefore apply the same logic to our Bonsai Tree. Based on its age (4 years old in 2021), we can use Gott’s logic to produce a similar result – (2021+ 4)*(1.5/0.5).

We can therefore expect, with 50% confidence, that our little friend will be back in the ground, where it probably belongs, by 2033.

However, 50% confidence intervals aren’t very useful are they. So what happens if we step it up to the standard 95% confidence? We get a result that says with 95% confidence our Bonsai will live between 0.1 years and 36 years.

How to improve your guesses

The Copernican principle is actually just an adaption of Bayes rule with what’s known as an uninformative prior (we know absolutely nothing about the underlying distribution of the lives of a Bonsai).

Obviously, if we knew the underlying distribution of the expected life of a Bonsai we could make a lot better guesses.

A Bonsai tree will follow what’s called a _power-law distribution._ A power law distribution is a distribution that allows for numerous scales. A Bonsai could live a month, a year, a decade, a century or even millennia. When applying Bayes Law to a power-law distribution the appropriate prediction strategy is a multiplicative rule in which you multiply the elapsed time by a constant factor. In the Copernican principle example, this constant would be 2. Therefore, if you have an uninformative prior and no knowledge of the distribution you should guess that your Bonsai should go on living exactly as long as it already has.

Numerous other distributions have different optimal prediction strategies when Bayes Theorem is applied. For example, the normal distribution requires an average rule in which you should predict the average if the Bonsai is below the average and predict a little bit longer if it has passed the average.

The implications for day to day life

So, what does this mean for day to day life? It turns out that humans, generally, are pretty good at using the correct prediction rule. This was highlighted in an experiment by Griffiths and Tenenbaum. They compared human intuition to real-world data with the Bayes rule applied and found the results were extremely close.

Therefore, it makes sense to trust your intuition as a last resort – if there really is no data. There might not be any data, but your mind has developed its own understanding of the distribution by osmosis.

"Small data is big data in disguise".

  • Brian Christian & Tom Griffiths

However, your inherent priors are a function of the information provided to you. So to make better predictions you simply have to be well informed and unbiased in your understanding of the world.

In the modern day of algorithms feeding you sensationalist news it knows you want to see, it makes sense to diversify your news inputs and as Christian and Griffiths state, it might even be a good idea to turn off the news.

And if you do get a Bonsai for Christmas – just google how to look after one.


Thanks for reading and I hope you enjoyed it. Some links to some of my other articles can be found below.

If I've inspired you to join medium I would be really grateful if you did it through this link - it will help to support me to write better content in the future.
If you want to learn more about Data Science, become a certified data scientist, or land a job in data science, then checkout 365 data science through my affiliate link.

How To Analyze Survey Data In Python

How to easily show your Matplotlib plots and Pandas dataframes dynamically on your website.

Cheers,

James.


Related Articles