5 Outlier Detection Techniques that every “Data Enthusiast” Must Know

Outlier Detection Methods (Visuals and Code)

Published in

Towards Data Science

8 min readJun 12, 2021

Outliers are those observations that differ strongly(different properties) from the other data points in the sample of a population. In this blog, we will go through 5 Outlier Detection techniques that every “Data Enthusiast” must know. But before that let’s take a look and understand the source of outliers.

What are the possible sources of outliers in a dataset?

There are multiple reasons why there can be outliers in the dataset, like Human errors(Wrong data entry), Measurement errors(System/Tool error), Data manipulation error(Faulty data preprocessing error), Sampling errors(creating samples from heterogeneous sources), etc. Importantly, detecting and treating these Outliers is important for learning a robust and generalizable machine learning system.

The Z-score(also called the standard score) is an important concept in statistics that indicates how far away a certain point is from the mean. By applying Z-transformation we shift the distribution and make it 0 mean with unit standard deviation. For example — A Z-score of 2 would mean the data point is 2 standard deviation away from the mean.

5 Outlier Detection Techniques that every “Data Enthusiast” Must Know

Outlier Detection Methods (Visuals and Code)

What are the possible sources of outliers in a dataset?

Published in Towards Data Science

Written by Prakhar Mishra

Responses (3)