The world’s leading publication for data science, AI, and ML professionals.

How Scikit-Learn’s StandardScaler works

In this post I am explaining why and how to apply Standardization using scikit-learn

How and why to Standardize your data: A python tutorial

Figure taken from: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html. Left subplot: the unscaled data. Right subplot: the transformed data.
Figure taken from: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html. Left subplot: the unscaled data. Right subplot: the transformed data.

Hi there.

This is my first Medium post. I am an electrical & computer engineer currently finishing my PhD studies in the biomedical engineering and computational neuroscience field. I have been working on Machine Learning problems for the past 4 years. A very common question that I see all around the web is how to standardize and why to do so, the data before fitting a machine learning model.

How does scikit-learn’s [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) work ?

The first question that comes to one’s mind is:

Why to standardize in the first place?

Why to standardize before fitting a ML model?

Well, the idea is simple. Variables that are measured at different scales do not contribute equally to the model fitting & model learned function and might end up creating a bias. Thus, to deal with this potential problem feature-wise standardized (μ=0, σ=1) is usually used prior to model fitting.

To do that using scikit-learn, we first need to construct an input array X containing the features and samples with X.shape being[number_of_samples, number_of_features] .

Keep in mind that all scikit-learn machine learning (ML) functions expect as input an numpy array X with that shape i.e. the rows are the samples and the columns are the features/variables. Having said that, let’s assume that we have a matrix X where each row/line is a sample/observation and each column is a variable/feature.

Note: Tree-based models are usually not dependent on scaling, but non-tree models models such as SVM, LDA etc. are often hugely dependent on it.


  • After a great deal of hard work and staying behind the scenes for quite a while, we’re excited to now offer our expertise through a collaborative platform, the "Data Science Hub" on Patreon (https://www.patreon.com/TheDataScienceHub). This hub is our way of providing you with bespoke consulting services and comprehensive responses to all your inquiries, ranging from Machine Learning to strategic data analytics planning.
  • Another resource. Learn Data Science and ML with the help of an 🤖 AI-powered tutor. Start here https://aigents.co/learn choose a topic and he will show up where you need him. No paywall, no signups, no ads.

Core of method

The main idea is to normalize/standardize i.e. μ = 0 and σ = 1 your features/variables/columns of X, individually, before applying any machine learning model. Thus, StandardScaler() will normalize the features i.e. each column of X, INDIVIDUALLY so that each column/feature/variable will have μ = 0 and σ = 1.

The mathematical formulation of the standardization procedure. Image generated by the author.
The mathematical formulation of the standardization procedure. Image generated by the author.

Working Python code example:

from Sklearn.preprocessing import StandardScaler
import numpy as np

# 4 samples/observations and 2 variables/features
X = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
# the scaler object (model)
scaler = StandardScaler()
# fit and transform the data
scaled_data = scaler.fit_transform(X) 

print(X)
[[0, 0],
 [1, 0],
 [0, 1],
 [1, 1]])

print(scaled_data)
[[-1. -1.]
 [ 1. -1.]
 [-1.  1.]
 [ 1.  1.]]

Verify that the mean of each feature (column) is 0:

scaled_data.mean(axis = 0)
array([0., 0.])

Verify that the std of each feature (column) is 1:

scaled_data.std(axis = 0)
array([1., 1.])

The effect of the transform in a visual example

Figure taken from: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html. Left subplot: the unscaled data. Right subplot: the transformed data.
Figure taken from: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html. Left subplot: the unscaled data. Right subplot: the transformed data.

Summary

  • StandardScaler removes the mean and scales each feature/variable to unit variance. This operation is performed feature-wise in an independent way.
  • StandardScaler can be influenced by outliers (if they exist in the dataset) since it involves the estimation of the empirical mean and standard deviation of each feature.

How to deal with outliers

  • Manual way (not recommended): Visually inspect the data and remove outliers using outlier removal statistical methods such as the Interquartile Range (IQR) threshold method.
  • Recommended way: Use the [RobustScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html) that will just scale the features but in this case using statistics that are robust to outliers. This scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

That’s all for today! Hope you liked this first post! Next story coming next week. Stay tuned & safe.

Stay tuned & support me

If you liked and found this article useful, follow me and applaud my story to support me!

– My mailing list in just 5 seconds: https://seralouk.medium.com/subscribe

– Become a member and support me:https://seralouk.medium.com/membership

References

[1] https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

[2] https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html

Get in touch with me


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.