The world’s leading publication for data science, AI, and ML professionals.

An Introduction to Building a World-Class Fraud Prediction Model

Types of models, key features to use, and how to evaluate one

Photo by Bermix Studio on Unsplash
Photo by Bermix Studio on Unsplash

Introduction

As the world becomes more digitized and people are better equipped with new technologies and tools, the level of fraudulent activity continues to reach record-highs. According to a report from PwC, fraud losses totaled US$42 billion in 2020, affecting 47% of all companies in the past 24 months.

Paradoxically speaking, the same technological advancements, like big data, the cloud, and modern prediction algorithms, allows companies to tackle fraud better than ever before. In this article, we’re particularly going to focus on the last point, fraud prediction algorithms – specifically, we’ll look at types of fraud models, features to use in a fraud model, and how to evaluate a fraud model.


Types of Fraud Prediction Models

Because "fraud" is such a comprehensive term, there are several types of fraud models that you can build, each serving their own purposes:

Profile-specific Models vs Transaction-specific Models

Profile-specific models focus on identifying fraudulent activity on a user-level, meaning that these models determine whether a user is fraudulent or not.

Transaction-specific models take a more granular approach and identify fraudulent transactions, rather than fraudulent users.

At a glance, it sounds like these models serve the same purpose, but it’s not always the case that a fraudulent transaction comes from a fraudulent user. An example of this is credit card theft – a user shouldn’t be deemed as fraudulent if his/her credit card was stolen and a fraudulent transaction was made on that credit card. Similarly, it’s not always the case that a fraudulent user makes a fraudulent transaction – whether that user should be allowed to make any transactions at all is a topic for another time.

Therefore, it’s important to consider both profile-specific and transaction-specific models.

Rules-Based Models vs Machine Learning Models

Rules-based models are models with hard coded rules, think of "if-else" statements (or case-when statements if you’re a SQL prodigy). With rules-based models, you’re responsible for coming up with the rules by yourself. Rules-based models are useful if you know the exact signals that indicate fraudulent activity.

For example, credit card companies usually have a rules-based approach that checks the location of where you use your credit card. If the distance between where you spend your credit card and your home address location passes a certain threshold – if you’re too far away from home – the transaction may automatically be denied.

Machine learning fraud-detection models have become increasingly popular with the emergence of Data Science over the past decade. Machine learning models are useful when you don’t know the exact signals that indicate fraudulent activity. Instead, you provide a machine learning model with a handful of features (variables) and let the model identify the signals itself.

For example, banks feed dozens of engineered features into machine learning models to identify what transactions are likely to be fraudulent and are moved to a second stage for further investigation. This segways into the next section, which is features that you should use for your fraud model.


Key Features to Use in Your Model

The main objective behind choosing features for your model is to include as many "signals" indicating fraudulent activity as possible.

To help spark some ideas, below is a non-exhaustive list of key features commonly used in machine learning models:

  • Time of registration or transaction: the time when a user registers or when a transaction is made is a good signal because fraudulent users don’t always behave the same as regular users. This means that they may make a number of transactions at a time when people don’t normally make transactions.
  • Location of transaction: As I alluded to before, the location in which a transaction was made can be a good indicator of whether the transaction is fraudulent or not. If a transaction is made 2000 miles away from the home address, that is an abnormal behavior and can possibly be fraudulent.
  • Cost to average spend ratio: This looks at the amount of a given transaction compared to the average spend of the given user. The larger the ratio, the more irregular the transaction is.
  • Email information: You can actually check when an email was created. If an email was created and an account was created on the same day, that may suggest fraudulent behavior.

The more signals you can provide, and the stronger the signals are, the better your model will be at predicting and identifying fraudulent activity.


How to Evaluate a Fraud Model

Evaluating a fraud prediction model has to be conducted differently from a normal Machine Learning model because of the nature of the problem, which will be elaborated on in the next section.

Why You Shouldn’t Use Accuracy

Fraud detection is classified as an imbalanced classification problem. Specifically, there is a significant imbalance between the number of fraudulent profiles/transactions and the number of non-fraudulent profiles/transactions. Because of this, using accuracy as an evaluation metric is not helpful.

To give an example why, consider a dataset with 1 fraudulent transaction and 99 non-fraudulent transactions (in the real world, the ratio is even smaller). If a machine learning model were to classify every single transaction as non-fraudulent, it would be 99% accurate! However, this fails to tackle the problem at hand, which is to classify all fraudulent transactions.

Metrics to use Instead

Instead, there are two metrics that you should consider when evaluating a fraud-prediction model:

Precision is also known as positive predictive value and is the proportion of relevant instances among the retrieved instances. In other words, it answers the question "What proportion of positive identifications was actually correct?"

Precision is preferred when the cost of classifying a non-fraudulent transaction as fraudulent is too high and you’re okay with only catching a portion of fraudulent transactions.

Recall, also known as the sensitivity, hit rate, or the true positive rate (TPR), is the proportion of the total number of relevant instances that were actually retrieved. It answers the question "What proportion of actual positives was identified correctly?"

Recall is preferred when it’s absolutely critical that you identify every single fraudulent transaction and are okay with incorrectly classifying some non-fraudulent transactions as fraudulent.

Each has its own pros and cons and should be chosen based on the business problem at hand. Below is an image to better understand the difference between the two.

Image created by Author
Image created by Author

Thanks for Reading!

If you want to learn more about fraud, ecommerce, and data science, check out this podcast with Elad Cohen, the VP of Data Science at Riskified. As well, if you want to get an idea of how to build an actual fraud prediction model in Python, you can check out this Kaggle repository of a credit card fraud predictive model.

As always, I wish you the best in your learning endeavors 🙂

Not sure what to read next? I’ve picked another article for you:

A Complete Guide to Revenue Cohort Analysis in SQL and Python

and another one!

A Complete Guide to Building a Marketing Mix Model in 2021

Terence Shin


Related Articles