Member-only story
5 Outlier Detection Techniques that every “Data Enthusiast” Must Know
Outlier Detection Methods (Visuals and Code)
Outliers are those observations that differ strongly(different properties) from the other data points in the sample of a population. In this blog, we will go through 5 Outlier Detection techniques that every “Data Enthusiast” must know. But before that let’s take a look and understand the source of outliers.
What are the possible sources of outliers in a dataset?
There are multiple reasons why there can be outliers in the dataset, like Human errors(Wrong data entry), Measurement errors(System/Tool error), Data manipulation error(Faulty data preprocessing error), Sampling errors(creating samples from heterogeneous sources), etc. Importantly, detecting and treating these Outliers is important for learning a robust and generalizable machine learning system.
The Z-score(also called the standard score) is an important concept in statistics that indicates how far away a certain point is from the mean. By applying Z-transformation we shift the distribution and make it 0 mean with unit standard deviation. For example — A Z-score of 2 would mean the data point is 2 standard deviation away from the mean.