Outlier Detection Using Random Forest Regressors: Leveraging Algorithm Strengths to Your Advantage

Using a model’s robustness to outliers to detect them

Michael Zakhary

Published in

Towards Data Science

8 min readSep 28, 2023

Problem Statement

The problem of outlier detection can be tricky, especially if the ground truth or the description of what is an outlier is ambiguous or based upon multiple factors. Mathematically speaking, an outlier can be defined as data points more than three standard deviations away from a mean. However, in most real-life problems, not all data points away from a mean are of the same significance, sometimes we require a bit more nuance when flagging outliers.

Let's take a quick example:

We have a dataset of water consumption per household. By analyzing the water consumption as a whole and isolating points 3 standard deviations from the mean, we can quickly get the outliers that use the most water.

This however fails to take into account the reason behind the increase in consumption, i.e. there could be multiple reasons why the water consumption is high, some reasons are of more interest…

Outlier Detection Using Random Forest Regressors: Leveraging Algorithm Strengths to Your Advantage

Using a model’s robustness to outliers to detect them

Problem Statement

Let's take a quick example:

Create an account to read the full story.

Published in Towards Data Science

Written by Michael Zakhary

No responses yet