Why using a mean for missing data is a bad idea. Alternative imputation algorithms.

Kacper Kubara
Towards Data Science
5 min readJun 24, 2019

--

Photo by Franki Chamaki on Unsplash

W e all know the pain when the dataset we want to use for Machine Learning contains missing data. The quick and easy workaround is to substitute a mean for numerical features and use a mode for categorical ones. Even better, someone might just insert 0's or discard the data and proceed to the training of the model. In the following article, I will…

--

--