The Data You Don‘t Need: Removing Redundant Samples
Published in
6 min readMar 19, 2020
In ML there is the saying garbage in, garbage out. But what does it really mean to have good or bad data? In this post, we will explore data redundancies in the training set of fashion-MNIST and how it affects test set accuracy.
What is Data Redundancy?
We leave the more detailed explanation for a next post but let’s give you an example of redundant data. Imagine you’re building a classifier, trying to distinguish between images of cats and dogs. You already have a dataset of 100 cats and are looking…