Before you start learning Data Science online

Notes on mistakes a young aspirant makes while learning Data Science online and how to overcome it.

Ali Aryan
Towards Data Science

--

Background Image by Steinar Engeland

We generate over 2.5 quintillion bytes of data every day. Over the last two years alone 90 percent of data that we have in the world was created [1]. As a result, many fields related to data have emerged. Many young aspirants start learning data science every day. I have seen my friends and other people who started making mistakes while learning it. When I started learning data science, I made mistakes too.

There are amazing courses offered by many great Professors from well-known institutions in the field of data science. Many young candidates have made mistakes, at least once while learning. Even you might have made these mistakes once or wished that someone told you these things before you started learning Data Science. When I started, I wasn’t aware of many things. I have compiled points that one should know before they start learning data science online.

Online Courses won’t teach you about domain knowledge

Before solving any problem, you need to have the right domain knowledge. Right domain knowledge will help you understand the features of a dataset well and will help you to fabricate the approach of analyzing the dataset to tell stories from it. Most of the young aspirants forget this point. Domain knowledge is the most overlooked skill but the most important one for beginners. But one should realize that it’s necessary to take time to learn about the domain and the problem that they are trying to solve.

Domain knowledge is the first step in approaching the problem. It’s the base on which your whole solution will depend.

You need to understand the dataset

Before you rush to find missing values or start to clean the data, look at the dataset properly, and try to understand it well. Use describe() method from the pandas library to extract more information about the data like mean, standard deviation, first quartile. Take any particular example from the dataset and use the features to understand it.

Image by Scott Graham

Do not rush to create machine learning models

Many beginners directly apply the ML algorithms before preprocessing any data. Anyone can write two to three lines to train the algorithm and predict the results. A data scientist spends 80% of their time preparing and managing the data. Look for outliers and correlations, fill in the missing values. Understand which feature is affecting the most. For Instance, if you want to solve a classification problem, then check for the class imbalance. In layman terms, A dataset is an imbalance if the classes do not contain the same number of examples. For example, Out of a binary classification task, my class A is 99% of the total data and class B is 1 % of the data. Imbalance data may lead to what is known as overfitting.

Take time to prepare and manage the data. It’s okay to take the time in this preprocessing.

Understand the code that has been written before solving the assignment

It’s common among many beginners to learn what they have done previously in that week of the course and apply it to the data. For Instance, if a student is learning about Support Vector Machines (One of the classification algorithms).In the assignment, If students had to train and test my model. Most of the students would just do that task. They won’t notice the code that has been previously written about preprocessing the data.

Don’t be shy about asking questions

Forums are a great way to ask questions. Do not be shy in asking questions. Always clear your doubts. Nobody should be embarrassed about asking questions. Keep feeding your curiosity.

The important thing is not to stop questioning. Curiosity has its own reason for existing. — Albert Einstein

If you don’t understand the algorithm, then implement it from scratch

If you are having issues in understanding a particular algorithm, then the best way is to code the algorithm yourself. In the process, you will learn about it and this will help you understand how the libraries like scikit-learn work.

Image by Hitesh Choudhary

Don’t cheat yourself

While you are doing assignments or quizzes, do not copy from internet resources. If you are stuck with the assignment, then take your time. It’s about how cleverly you think to solve the problem than just getting it done for a certificate. Most of the things that I learned came from getting stuck in a problem.

Image by JESHOOTS.COM on Unsplash

Previous knowledge of coding and mathematics is highly beneficial

I disagree with people who say that you don’t need experience in coding to learn some data science courses. Having a basic knowledge about coding helps to get along. They may teach you about Python/R later, but if you are learning about it in a course, then I would suggest you solve problems on platforms like HackerRank, HackerEarth to gain more experience in a particular language so that later if you use any library like matplotlib, NumPy then you would be comfortable to write it’s code. If you aren’t aware of basic mathematics, then don’t worry, there are excellent resources on YouTube to learn that. If you want to learn calculus, then I would suggest 3Blue1Brown’s channel for Calculus, linear algebra, and Stat Quest for Statistics. They are excellent for beginners who wish to start Mathematics for Data Science.

Image by Antoine Dautry

Do not settle with initial results of ML algorithm

You can always tune the values of various algorithms to improve your model. Do not settle with your initial results of the model. Always use hyperparameter tuning to get better results. Many algorithms allow you to change their hyper-parameter values like K nearest neighbor algorithm allows you to change the number of neighbors that may improve the result. Remember that you can always perform better.

Never Stop Learning

After you have finished any online course, do not stop learning. An online course will help you build strong fundamentals, but you have to keep learning. Keep looking for various data sets and practice. Practice is the key to become good at any job. Keep looking at various blog posts, notebooks, videos, research papers to learn more. Never restrict yourself to anything. As a Data Scientist, you need to keep learning new technologies all the time. Do not give up.

Image by Elements5Digital

References:

[1] blazon, How much data do we create every day (2019)

--

--

Senior Data Analyst at Merkle Inc | Analytics | Machine Learning| Deeplearning. Always learning and exploring something new.