The Data You Don‘t Need: Removing Redundant Samples

Igor Susmelj
Towards Data Science
6 min readMar 19, 2020

--

In ML there is the saying garbage in, garbage out. But what does it really mean to have good or bad data? In this post, we will explore data redundancies in the training set of fashion-MNIST and how it affects test set accuracy.

What is Data Redundancy?

We leave the more detailed explanation for a next post but let’s give you an example of redundant data. Imagine you’re building a classifier, trying to distinguish between images of cats and dogs. You already have a dataset of 100 cats and are looking…

--

--

Co-founder at Lightly | Writer at Medium about Computer Vision, Startups and Machine Learning