5 stages of learning Data Science

A blueprint of what the journey looks like

SudoPurge
Towards Data Science

--

With recruiters listing a myriad of “preferred skills” in their job postings, learning Data Science can get quite overwhelming at times. Dividing the journey up into five chapters can provide a clearer picture of what lies ahead.

Chapter 1

Great! You have decided to learn the art of Data Science and now you want to take on the challenge. But which language should you pick? R or Python? In my previous article, I explain why, but for now, all you need to do is just pick one and get started with it. Go to YouTube and search up tutorials for the language of your choice. The first thing you need to learn is how to install the language and it’s IDE (Integrated Development Environment).

Now don’t just watch the tutorials. You must also simultaneously DO what you learn. More often than not, people get stuck at this stage. I spent about a year and a half trying to convince myself to actually consistently follow along to a tutorial. It's not necessary to understand every piece of code you write at first. It is important however to do it.

Never rely on one video or resource to be sufficient though. Try out different ones and see which one suits your learning style. It’s important to understand the same concept from different perspectives. One video could be just perfect for one particular concept like the primary data types or loops and iterations but could suck at the other topics.

Don’t let this discourage you to continue. All you need to know is the keyword for your next topic, like “variables”, or “Object-Oriented Programming”, and do another search on YouTube or Google with it. There are loads of open source and free resources online. If you can’t connect to one instructor, you can move on to the next one. That’s the beauty of it.

I am currently putting together a list of topics and such keywords you can use in each stage for Python (sorry R folks, I don’t think I have enough expertise with it to provide a similar list for R). Most universities put up their curriculum on their course websites. Look them up and see which concepts should follow after which. As an independent learner, you will need to know how to look for them. And the only way to get better at it is simply doing it every time you stumble across something you don’t understand (which will be very, very often). Eventually, you will find some websites or YouTube channels that really work for you.

Chapter 2

After a few months, at one point you will have learned a lot of Python or R along with some of the essential Data Science libraries or packages. If you haven’t yet discovered Kaggle, this is where you will live for a while now. This next chapter is where you start looking at others’ codes, and how others analyze their data on Kaggle. And then it’ll hit. You will realize how much you still don't know. You’ll feel like you are so far behind and everyone else is far ahead of you. Why couldn’t you start a year earlier? You will feel too overwhelmed to continue.

That's when you’ll need to figure out why you wanted to learn it in the first place, why you are doing what you are doing and why it could be worth all the hustle. Take a break if you need to. Use this time to plan ahead and be mentally prepared to handle this new unknown territory. Break up the next part of your journey into smaller chapters. Now that you know what you don’t know, this will be easier.

Every two weeks, look back at how much you learned, until this overwhelming feeling vanishes, or at least until it gets better. Even if you have spent two weeks on one single project if you really put in the effort and hard work, that’s two weeks closer to where you wanted to be.

You should also look into the basics of SQL and the other language between Python and R which you didn’t choose initially. This shouldn’t be too tough now since you already know one language and have a good feel for how programming languages work.

Chapter 3

Looking at others’ codes may feel like you are cheating. It’s okay to look at others’ codes on Kaggle. You won’t understand all the code at first and that’s completely okay and normal too. If you are indeed comfortable with all the code in a notebook, you are not really learning anything new from that notebook. Push your comfort zone. The only way to learn is to keep exploring this uncharted territory. Just like a supervised model learns from labelled examples, you too will learn from others' code examples.

You will come across new packages or concepts. Try to understand them using the documentations, or Stack Overflow, or YouTube. If you need to freshen up your maths and statistics knowledge, there are fantastic videos on YouTube for those as well, or just a simple Google search with the right keywords will often lead you somewhere useful. One channel that deserves a mention here is 3Blue1Brown. You should really pin down the fundamentals of linear algebra and differential calculus, along with some basic stats. Understanding the fundamentals of SQL (and one language like PostgreSQL) and relational databases would really help and broaden your horizons.

Create your own reference codebase of functions and methods you end up using a lot. One important thing to consider while choosing your personal projects is how it relates to your domain. And that means this is when you start building your portfolio for potential employers.

You may come across a method or technique for solving a problem that you know you could never be able to think of on your own. But remember, now that you have come across this method, you are a slightly better Data Scientist than you were before. It adds up.

Don’t just look at data analysis techniques. Take notes on how data exploration, preprocessing, and engineering are done, and why data visualization is so darn important. Think of ways to device data collection such as Web Scraping. Try to understand the data life cycle. Publish them on Kaggle, LinkedIn, or Medium.

Chapter 4

Start learning the basic machine learning algorithms. Soon you will figure out its beautiful, elegant, and exciting applications. Often times, you won’t necessarily need to be able to write down the Maths or the formulae of how a specific algorithm or model works. But knowing how the model works, and the reasoning behind it will be enough for now, unless you want to opt into ML research specifically of course.

You will need to learn about each of the components of an ML model. Why do we need to worry about overfitting? What are hyperparameters and how they affect the model? What are optimizers and how to use them effectively? Why is regularization important? Why is one particular model better at solving a particular problem than another one? Is a complex neural network more effective and really necessary to use, rather than simpler models like regression, or classification, or clustering?

There are loads of ML algorithms but you don’t “gotta learn them all” (*insert Pokemon theme song in the background*). You should figure out which ML techniques are more useful to your domain knowledge. Again, YouTube can be a good starting point as it has been so far. But this time you will definitely need to rely more on other resources like module documentations, or niche blogs like on Medium. By now you should probably be a God of looking things up online.

Chapter 5

You are proficient in Python, R, and SQL. You have developed intuition to analyze almost any data. And you know how to apply ML models. Now is the time to take your skills to the next level. The last most crucial thing you will need to learn is the art of putting together the data pipeline, integrating with cloud services like AWS, Azure, IBM Cloud, Hadoop, Spark, to name a few, and pushing it into production. Again, there are loads of resources online. You just have to look them up.

However, this is where your domain expertise will dictate most of what you do. You have mastered the tool that Data Science is but what you want to do with it and how you apply it in your own domain is why you will be hired. You may want to explore Natural Language Processing to analyze big genome data or carry out sentiment analysis for a chatbot to automate customer service for a company. You may want to learn the ins and outs of how Convolutional Neural Networks work to detect objects through computer vision. Or you may just want to analyze marketing and customer behavior data to help create better-informed strategies for brand growth or profit.

You are finally able to tackle most data science problems independently or offer meaningful contributions to collaborative efforts. But remember, it's not the end of the learning. In fact, you will need to constantly update yourself with the latest advancements and that’s probably pretty obvious to you by now.

This is purely based on my own experience and gross generalization. In most cases though, it’s not really as linear and you will bounce between the different chapters, especially chapters 3 and 4. The experience will definitely be different for you in some way or the other. There are other methods to ace each of these stages which are as effective as these, if not more. Some of you may focus on data visualization, some may specialize in Machine Learning. But I guess there is an element of commonality between them; they all fall under the umbrella of Data Science.

It's all about how you tackle the unknown. If you let every new concept you come across discourage you, you will give up in no time. Instead, think of it as an opportunity to be a better Data Scientist. Remember the three principles:

  1. divide larger concepts into digestible chunks
  2. learn how to look these up online to find resources that resonate with you and finally,
  3. apply them in your own projects.

The most crucial part is maintaining momentum and learning or practicing regularly. But the most difficult part is sitting down with your computer to start working. Once you successfully convince yourself to start for the day though, it does get easier every time. At the end of the day, if you don’t get those bursts of your “feel-good hormones” when you convert the little bits of data into tangible knowledge, is it really something you wanna do for the long term?

The important thing to remember is that you need to love how you get to your destination, the journey, the process, not just the destination itself. Play around with every new concept you learn. Tweak things here and there. Let your curiosity flow.

P.S. If you want more short, to the point articles on Data Science and how a biologist navigates his way through the Data revolution, consider following my blog.

With thousands of videos being uploaded every minute, it’s important to have them filtered out so that you consume only the good quality data. Hand-picked by myself, I will email you educational videos of the topics you are interested to learn. Sign-up here.

Thank you!

--

--