The way I’m learning Data Science (and why I think you should use it)

Aíquis Rodrigues
Towards Data Science
4 min readAug 29, 2017

--

The idea of writing this post came after I spoke during the lightning talk of PyData BH #8 about my approach on learning data science. It seems like people there liked it, so, why not share my humble thoughts with the world?

First, let me introduce myself and the reason why I’m writing this post. I’m Aíquis and part of my day to day job is to analyze data. By now, I’m not a full-time data analyst or data scientist, but I have a huge interest in data field, therefore I’m always reading, watching videos and studying subjects about this area.

I’m not here to answer questions like “How can I become a Data Scientist?”/”How can I land a job in DS area?”/”What are the best DS courses available online?”. For this, you can go to Quora and find some useful answers.

Data Science search trend in the last 5 year (via Google Trends)

Nowadays, there are a bunch of MOOCs offering the golden path to become a successful data scientist. You can spend a life and a big amount of money buying courses on Coursera, Udemy or Datacamp but chances are that they’re not giving you the most important skill for a data professional: solve real world problems with the tools they’re teaching.

So many courses that I can’t decide where to spend my money

For me, the best way to really learn DS is to learn by practicing. It might look like something obvious, and it really is, but how can you practice something if you have no idea where to start? My approach here is to use problems YOU have to practice your skills.

Can you think of any problem, personal or professional, that you have and can be solved using things that you learned studying Python/R, Data Viz, Statistics or Machine Learning? Why not use them (even if they’re already solved) to test your knowledge?

I’ll give my personal example here. In the last few months, I’ve been taking Datacamp Data Scientist Path (Python). One of the things that I soon identified is that Datacamp is very poor on practicing. Most of the exercises are like copy & paste shit that doesn’t make you really think about solutions to the problems. What to do to practice then? Get a problem I’m having and try to solve it using what I was studying.

In this case, I used my timesheet of working hours, a spreadsheet that I have of the hours I came in and out of the office each day, and tried to answer these questions:

  • How many extra hours I have
  • What’s the mean of extra hours by day/week/month
  • The days when I did most of the extra hours (excluding weekends)

The project is here:

As you can see, it’s a simple task that could be easily done using a spreadsheet. Instead, I preferred to do this in Python using pandas and reading the data from a Google Spreadsheet where I actually fill this information (I could copy it to a local CSV file to make it easier, but I would not practice using gspread, a Google Spreadsheet Python API, to collect my data). I haven’t finished this project yet, but by now I have already learned really cool things and managed to solve problems that only appear when you’re working with real world data.

Of course you can practice by solving some generic problems like predict survivors on Titanic or build a spam classifier, but when you work on a project that represents something real, you’ll probably have more engagement to finish it and will feel that the things you’re learning are really useful and can be applied to your daily problems. Furthermore, you’ll be building a personal portfolio.

If you can’t think a personal project, this Quora answer has some good projects to put your hands on and build a portfolio

When you solve a problem you are putting yourself in the place of an employer who has a problem and needs to be solved by the data professional, but in this case, you generate the problem and solve it, being part of all the chain: have a problem, specify it, develop a solution, review, evolve and gain insights with the results.

If you really want to work with data, as a data scientist or data analyst, you really should get your hands dirty and try to solve problems. Just watching lectures won’t make you a problem solver, which is really important to be as a data professional. Look around, try to face some something of your routine as a data science problem, apply what you’re studying on the solution (even if it’s not the simplest way), get stuck on problems that you didn’t predict and solve it (Google and Stackoverflow are your friends). In the end, build something that will make you proud when you see the final result, that’s the best part.

Special thanks to my friends Caio Mattos and Raphaella who reviewed this post and gave me some feedback. If you have any feedback feel free to comment here or reach me on Twitter!

--

--

Product Manager at Pier.digital. Former Esporte Interativo (WarnerMedia) and Ingresso.com (Fandango). Innovation, Tech, Data and Sports lover