STATISTICS

The Cost of Making Statistical Errors

A Guide to Cost Implications of Making Statistical Errors for Data Scientists

Published in

Towards Data Science

8 min readFeb 7, 2022

When you were a child you may have read the story “The Boy Who Cried Wolf”. This is the story of a shepherd who used to raise false alarms about seeing a wolf and calling people for help when in fact there was no wolf. He repeatedly did it for his amusement, but when there was actually a wolf, and he cried for help nobody came because the villagers thought that he was lying again. This is a popular story read to children, especially in moral education and ethics classes, where they are told to be honest and not lie as the world doesn’t believe in liars.

Photo by alleksana from Pexels: https://www.pexels.com/photo/wood-dirty-writing-abstract-4271933/

Humans don’t always lie, sometimes there are just errors in judgment and measurement which may have serious consequences and have the potential of changing businesses, societies, and circumstances. There are a lot of practical and informative resources online that explain in detail what the different types of statistical errors are, what the causes of those errors are, and how to prevent those errors. However, each and every decision one makes in a business has implications. In this article, I explain how to ascertain the cost implications of making statistical errors.

This article is outlined in the following manner. We will briefly cover the types of errors in statistics with some examples. After that, we define some terms relevant to the field of error analysis in Machine Learning: Precision, Recall, and Specificity. Thereafter, we will learn how to decide which error is more harmful to your business by taking its economic costs into consideration. We will end the article by discussing various ways of reducing errors. This is a general article geared more toward the practitioners and/or beginner data scientists, so I avoid using mathematical terminologies and statistical symbols. In case you have any questions, I am available for a discussion. So, without further delay, let’s get started.

Types of Error in Statistics

The first thing we need to understand is the term error. An error is the difference between an actual value and ascertained/collected value of a quantity. The higher the error is, the lower our estimate is of the reality. For example, if the actual temperature outside is 21 C, but my machine says it’s 20 C, then there is a difference of 1 C or a deviation of 4.76% from the actual value. Whether we are okay with this error is dependent upon what we are going to do with this measurement and what the business cost of an inaccurate measurement will be. One of the goals of statistical optimization is to reduce the error of our model under various contexts so that it’s robust enough to be used in a number of scenarios.

Type I: False Positive Error (FALSE ALARM)

Type I Error, also known as False Positive Error, happens when something is actually not true or not present but is ascertained as true or present. For example, if I get a COVID-test done, then Type I Error (False Positive Error) will be identifying me as positive, when in fact I am not positive. Another example is the instance of raising a fire alarm when there is no fire, which is raising a false alarm. One more example is convicting someone as a criminal, when in fact they are not a criminal or testing someone as pregnant when they are not pregnant. These are all examples of False Positives.

Type II: False-Negative Error (MISS)

Type II Error, also known as False Negative Error, happens when something is actually true or present but is ascertained as not true or not present. For example, if I get a COVID-test done, then Type II Error (False Negative Error) will be identifying me as negative, when in fact I am not negative. Another example is not raising a fire alarm when there is actually a fire. One more example is acquitting someone, when in fact they are actually a criminal or testing someone as not pregnant when they are actually pregnant. These are all examples of False Negatives.

The previous examples show that defining something as false positive or false negative is dependent upon how “being positive” is defined in that situation. For example, if the instance of seeing fire is seen as positive, then a false positive will be raising a false alarm when there is no fire. However, if the instance of not seeing fire is seen as positive, then a false positive will be seeing a fire. For practical purposes and simplicity in understanding, the happening of an event is considered positive. So, seeing a fire is considered positive.

Some Important Terms

Now that you have a fair understanding of the terms false positive and false negative, let us understand three more terms before we get to the economic costs of statistical errors. These terms are important from the point of view of applied machine learning for business purposes. Like earlier, we will review these concepts using examples from real-world life.

Suppose you work for a financial organization where the rate of interest offered to an applicant is dependent upon their past credit profile. A person with a higher default probability will be offered a higher rate of interest and a person with a lower default probability will be offered a lower rate of interest. For sake of simplicity, let’s say you have been asked to make a model that classifies the applicants into high-risk or low-risk borrowers. Because we have a lot of people who don’t default, and only some who default, we will have an imbalanced dataset. Therefore, we cannot rate our model’s performance, just on the basis of accuracy, thus we need additional metrics to rate the performance.

For this example, we will consider defaulting as positive. Thus a person who defaults will be considered positive. We will define and understand recall or sensitivity, specificity, and precision now.

Recall or Sensitivity or True Positive Rate

Sensitivity or recall is a measure of how well a model can identify true positives, that is high-risk borrowers. In our example, true positive represents the model identifying borrowers who will default and thus giving them a high-risk label so that their rate of interest is higher as compared to those who have a lower risk of default. If our model’s sensitivity is not high, this will translate to more people getting a low-risk rating, thus a lower rate of interest, as the model will not be able to discern the true positives, that is to say, identifying more people will be classified as low-risk. Financially speaking, this can be dangerous for the business, as more people will default which the model predicted to be a lower-risk borrower. Thus, in this case, our model should have a high sensitivity.

Specificity or True Negative Rate

Specificity is a measure of how well a model can identify true negatives, that is low-risk borrowers. In our example, true negative represents the model identifying borrowers who will not default and thus giving them a low-risk label, so that their rate of interest is lower as compared to those who have a higher risk of default. If our model’s specificity is not high, this will translate to more people getting a higher rate of interest, as the model will not be able to discern the true negatives, that is to say, identifying more people will be classified as high-risk. Financially speaking, this can be good as more income will be there for the bank, but this may also mean that some borrowers will not borrow as the cost of borrowing will be too high, which will translate to the business losing customers or customers getting frustrated and moving to competitors. With data one will be able to identify which will be costlier, losing customers (losing potential income) or making more money from existing customers.

Precision or Positive Predictive Rate

Precision is a measure of discernment of the model. In other words, if your model identifies 500 people as those who have a high risk of default, but in actuality, only 400 people are those who defaulted (past data), then the precision of the model is 400/500 = 0.8

At this point, it is worth mentioning that discussing precision, recall, or specificity in isolation is not helpful. You should know all three values before you can make an estimate of how costly the measure is.

Economic Costs of Statistical Errors

There are two steps involved in assessing the cost of making statistical errors. The first is defining what you will call positive. For example, in our previous example, a person defaulting is considered positive. The second step is calculating the false positive rate (false alarm rate), and the false-negative rate (miss rate). If the false-positive rate is high, that means a lot of people are being rated as high-risk, which is not good for the business. If the false-negative rate is high, that means a lot of people are considered low-risk, which in turn makes businesses lose money. Suppose the false positive rate of our model is 0.2, this means that amongst 100 people, our model identifies 20 people as false positive, thus 20 customers will be frustrated by being labeled as possible high-risk borrowers. But, if the false-negative rate of our model is 0.15, this will mean 15 people out of 100 who should be high-risk people are classified as low-risk people, thus potentially causing the business to lose money. If an average defaulter costs the business INR 500K, then the cost of the false-negative rate will be 0.15X500000 = 75000 per wrongly identified customer. To calculate the false positive rate and false-negative rate, it is recommended to find the confusion matrix, and calculate the relevant values.

Source: Confusion Matrix https://en.wikipedia.org/wiki/Confusion_matrix

How To Reduce Errors

You can never eliminate all the statistical errors. There are ways to reduce them. These are ways you can use to reduce statistical errors.

The Type II Error has an inverse relationship with the power of a statistical test or model. The power of a statistical test can be defined as its true ability to discern reality and thus reject the null hypothesis correctly. This means that the higher the power of a statistical test, the lower the probability of committing Type II Error. Thus, you can increase the power of the test. This can have cost implications as greater power leads to higher sample size and consequently higher costs. Reducing Type II Error leads to an increase in committing the Type I Error. Thus, you should assess the cost impact of making these errors.
One of the ways of reducing Type I Error is to minimize the significance level (the probability of rejecting the null hypothesis when it’s true). Since the significance level is chosen by a researcher, the level can be changed. For example, the significance level can be minimized to 1% (0.01). This indicates that there is a 1% probability of incorrectly rejecting the null hypothesis (the hypothesis that there is no difference between two variables).

Summary

In this article, we discussed various types of errors, their causes, cost implications, and ways of reducing those errors. My belief is that it will help you to make an informed decision regarding cost estimation of various types of errors. Moreover, it is imperative to understand that statistical errors can only be minimized and not eliminated entirely. The best way to deal with the errors is to keep working on their reduction and have additional checks and balances (for example, insurance) to reduce the impact of making a false judgment.

Originally published at https://aayushmalik.substack.com on February 7, 2022.