5 Outlier Detection Techniques that every “Data Enthusiast” Must Know

Outlier Detection Methods (Visuals and Code)

Prakhar Mishra
Towards Data Science
8 min readJun 12, 2021

--

Modified Image from Source

Outliers are those observations that differ strongly(different properties) from the other data points in the sample of a population. In this blog, we will go through 5 Outlier Detection techniques that every “Data Enthusiast” must know. But before that let’s take a look and understand the source of outliers.

What are the possible sources of outliers in a dataset?

There are multiple reasons why there can be outliers in the dataset, like Human errors(Wrong data entry), Measurement errors(System/Tool error), Data manipulation error(Faulty data preprocessing error), Sampling errors(creating samples from heterogeneous sources), etc. Importantly, detecting and treating these Outliers is important for learning a robust and generalizable machine learning system.

The Z-score(also called the standard score) is an important concept in statistics that indicates how far away a certain point is from the mean. By applying Z-transformation we shift the distribution and make it 0 mean with unit standard deviation. For example — A Z-score of 2 would mean the data point is 2 standard deviation away from the mean.

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Prakhar Mishra
Prakhar Mishra

Responses (3)

What are your thoughts?