10 Examples of Awful Data that I had to work with as a Data Scientist
A glimpse into the frustrating dark side of working with data
As you may or may not know, a large portion of data science is working with bad data.
I had a lot of fun writing this, so hopefully you get a good kick out of this too. Here are 10 examples of instances that I had to work with extremely messy data. I’m sure many of you will be able to relate to a lot of these points!
1) USA, US, or United States?
Problem: I made this the first point because I think it’s something that many of us can relate to. I never understood why an application should give the user the choice to spell their country however they want as opposed to giving them a searchable list because it results in having to deal with this problem.
I once worked with geographical data and had to deal with differently spelled countries, i.e United States, USA, US, United States of America.
Solution: We created a mapping table to solve the problem, but that meant that we had to constantly update it to address any new variations that came into the system.