Notes from Industry

Data Cleaning IS Analysis, Not Grunt Work

Randy Au
Towards Data Science

--

There’s a very oddly cut stump along the Hudson River in Manhattan (as of 2018 anyways), almost like a chair. But if you look closer the tree had grown around some old iron fencing, leaving a stump invincible to unmotivated chainsaw operators (Photo: Randy Au)

TL;DR: Cleaning data is considered by some people [citation needed] to be menial work that’s somehow “beneath” the sexy “real” data science work. I call BS. The act of cleaning data imposes values/judgments/interpretations upon data intended to allow downstream analysis algorithms to function and give results. That’s exactly the same as doing data analysis. In

--

--

I stress about data quality a lot. Data nerd/scientist, camera junkie. Quant UXR @Google Cloud. Formerly @bitly, @Meetup, @primarydotcom. Opinions are my own.