MACHINE LEARNING / DATA SCIENCE

When shouldn’t you use machine learning to make predictions?

Machine Learning in a world of Radical Uncertainty

Eddie Pease

Published in

Towards Data Science

8 min readMay 4, 2020

Machine learning is getting better and better. More and more companies are using machine learning as part of their products such that an increasing number of decisions affecting us are influenced by an algorithm. Wondering what to watch on Netflix? The suggestions of what to watch next are served by a very powerful recommendation engine. Instinctively, we feel comfortable about decisions like this being taken by an algorithm. But consider the following scenarios:

Goldman Sachs in 2007. Days after the French bank BNP Paribas had suspended redemptions from 3 of its funds (widely acknowledged as the start of the Great Financial Crisis), the Goldman Sachs CFO, David Viniar, told the Financial Times “We were seeing things that were 25-standard deviation moves several days in a row”. Taken literally, the probability of this is so low that the universe has not existed long enough for several of such events to occur.
Before it was outlawed in 1977 in the United States, it was common for companies to charge more for services, such as credit, to people who lived in a particular area without regard to their own credit history. This was known as ‘redlining’ and had a disproportionate effect on African-Americans. While it might be true that people who live in specific areas in general have a poor credit history, many in those areas could certainly have paid their debts. Most of us would think that this form of ‘statistical discrimination’ was morally dubious.

In the above situations, the use of algorithmic decision making / machine learning was either performed poorly or was highly inappropriate. I recently read a fantastic book called “Radical Uncertainty — decision making for an unknowable future”, by Mervyn King and John Kay. It got me thinking about the role of machine learning in making predictions. So, given a problem with a representative sample of data, in what situations should you be wary about using machine learning? Read on for broad high-level framework…

Different sorts of reasoning

But before we consider this framework, it helps to consider the different ways that humans reason about the world:

Deductive Reasoning — such as “I live in London. London is in the United Kingdom. Therefore I live in the United Kingdom”.
Inductive Reasoning — such as “Analysis of past election results indicate that voters favour the incumbent party in favourable economic circumstances. In the 2016 US Presidential election, their economic conditions were neither favourable nor unfavourable. Therefore the election will be close”. This reasoning uses events that have happened in the past to infer likely future outcomes.
Abductive Reasoning — such as “Donald Trump won the 2016 presidential election because of concerns in particular swing states over economic conditions and identity, and because his opponent was widely disliked”. This reasoning provides the best explanation for a unique event. Human are great at this because we are great at filtering disparate and often conflicting evidence in search of the best explanation.

These types of reasoning correspond to the following levels of machine sophistication:

Deductive Reasoning — traditional software is very good at this. Indeed deductive reasoning is the basis of all computer code. For example, if ‘a’ equals ‘b’ and ‘b’ equals ‘c’, then logically ‘a’ equals ‘c’.
Inductive reasoning — machine learning uses this reasoning by using past data to make inferences about the future. Perhaps it is no coincidence that Andrej Karpathy has called this ‘Software 2.0’.
Abductive Reasoning — if computers could reason in this way, would this be true Artificial General Intelligence (Software 3.0)? Computers are very bad at this because it is often not obvious which data to use and the data available might well be incomplete.

It therefore follows the more than predictions involve abductive reasoning, the less useful machine learning will be. Almost by definition, historic data is only useful to a degree in informing responses to unique events. So how does this relate to our 2 scenarios above?

Stationarity

One of the key assumptions of machine learning models is ‘stationarity’ — that is, the underlying probability distribution of what is being modelled does not change. Put simply, this means that the system cannot react to current predictions about its future state.

Before the financial crisis, banks calculated their risk using Value at Risk models (VaR). These models started to be developed in the 1980s and have 2 key inputs — the daily returns of a particular asset and the co-variance of returns between different assets. Using these two pieces of information, a probability distribution can be constructed which details the maximum likely loss on a portfolio of assets on a single day. How these these two values calculated? Historic data of course!

However, this historic data had been drawn from a period in which banks had not suffered crippling losses. In a similar way, pre-2007, the default on mortgages were mainly a result of individual misfortunes (e.g. loss of job) and therefore were not correlated to each other. However, when loans being made depended on rising house prices, any drop in prices could result in many defaults. Therefore assets which were believed to have a low co-variance ended up have a very high co-variance under certain circumstances. The system was only stationary under certain conditions.

There are, of course, many scenarios in which the assumption of stationarity is valid. For example, consider a machine learning algorithm which views images of tumours and predict a likelihood that the tumour is cancerous. The tumour certainly does not react to the prediction made by the algorithm and assume a better ‘disguise’ in the future.

This is not to say that machine learning cannot be useful in non-stationary environments — they certainty can be. However, extra care needs to be taken when interpreting the results of such models. A couple of suggested ways to protect against model failure:

Constant monitoring of model performance
Good judgement — the authors of ‘Radical Uncertainty’ suggest asking a simple question — “What is going on here?” So, in the case of Goldman Sachs, this would mean realising that the financial environment of the Great Financial Crisis meant that their assets where far more correlated than their model had assumed and thus that their model was of limited value. This, of course, is a form of abductive reasoning.

So, the first part of the framework is as follows: the less stationary the model environment, the more wary to be of machine learning.

Decision Importance

In the second scenario, we instinctively feel quite uncomfortable about the price of credit being different without any reference to the credit-worthiness of the individual. Taken to extremes, imagine the decision of the court case being determined entirely by the broad profile of the individual and not on the specifics of the case. The more important the decision for the individual, the more uncomfortable we feel about a decision being taken in this way.

Part of our sense of injustice comes from not taking account of the unique circumstances of each case. Just because someone of his/her background is usually guilty, what evidence is there to suggest that he/she committed the crime? If an event is unique, we feel as though each situation needs to be considered on its merits. Indeed, as discussed earlier, the more unique an event, the less capable machine learning is of providing a reasonable prediction in any case.

Part of our sense of injustice also arises from the consequences of the wrong decision for the individual. We are all unique, of course, but there are few campaigns against the injustice of Netflix recommendations. The cost of not being recommended a good film is, after all, quite low. However, we might feel slightly more uncomfortable about being rejected from job application by automatic algorithmic screening. Although I wasn’t a typical applicant to this role, surely I was a good fit? Why was I rejected?

The more unique an event, the more we explain the event to each other in the form of narrative. ‘Trump won the 2016 Presidential election as a reaction against the identity politics of Hilary Clinton’, for example. These explanations cannot meaningfully be described as ‘optimal’ — indeed there may be many competing explanations. Machine learning, on the other hand, explains in terms of historical correlations and therefore struggles to explain and create a narrative around an event.

Again, this is not to say that machine learning cannot be useful for important decisions and unique events. However, the more that this is the case, we should place more and more importance on explainability and exercise good judgement about the limitations of our model.

So, the second part of the framework is as follows: the more important a decision & unique an event, the more wary to be of machine learning.

Examples

Below are a few machine learning examples to illustrate the framework in action.

A cancerous tumour detection algorithm — the problem is stationary but the importance of the decision is high. Explanability required and needs to be used with care.
Data centre optimisation (like in this case)— the environment is stationary and the importance of each decision is low — good use of machine learning.
Netflix Recommendation Algorithm — non-stationary environment (people’s preferences change over time) but cost of decisions is low — a good use of machine learning.
Clinical Trial Prediction — non-stationary environment (new trial design paradigms, higher standards of care) and cost of decisions is high. Explainability is required and machine learning should be used with great care.

Summary

So, in conclusion, when you are approaching a problem with machine learning, you should be asking yourself two questions:

Is the model environment stationary?
How important and unique is each model prediction?

Finally, if either of these situations apply, it is always worth continually asking the question posed by the authors of Radical Uncertainty — “what is going on here”. It only seems apt to end with the words of Yogi Berra — “It’s tough to make predictions, especially about the future”.