For two years now, I’ve been studying Data Science concepts on my own, and through this journey, I’ve gained many insights that I want to share with new data scientists who are starting out.
Feel free to take what you want from this article, but I’m simply sharing my opinion for those who are a little lost and would like some more guidance. With that being said, here are my 5 THINGS I WISH I KNEW when I started learning data science:
1) Try to be a good programmer and a good statistician before being a good data scientist.
If you’ve read my older articles, you’ve probably already heard me say this – a data scientist is really a modern statistician who leverages programming to implement statistical methods.
Understanding the fundamentals will make your life a lot easier and actually save you time in the long run. Almost all Machine Learning concepts and algorithms are based on statistics and probability, and on top of that, many other data science concepts, like A/B testing, are purely statistical as well.
Ultimately, how good you are as a data scientist is limited by how knowledgable you are in programming and statistics.
Check out my previous article, "How I’d Learn Data Science if I Could Start Over" for more guidance on this point.
TLDR: Have a good Programming and statistics foundation before learning anything else. It will save you much more time in the long run.
2) Spend less time on online bootcamps, and more time on personal data science projects.
I know this may be a controversial statement to some of you, so let me preface thing by saying a couple of things:
- This is entirely based on my anecdotal evidence as well as my observations from my peers.
- There are obviously some amazing online courses/bootcamps that aren’t included in my generalized statement, like deeplearning.ai’s courses.
- I also want to say that it’s better that you’re doing a bootcamp if the alternative is nothing.
That being said, here are several problems with online bootcamps.
- They tend to be very surface level in terms of the depth of material, and not only that, but they also tend to give a false sense of understanding of the material that was learned.
- They also tend to not be very good for retaining information. I think you can agree that the more time you spend studying a subject, the more likely you are to retain information. The problem with these bootcamps, especially the ones that are advertised as "becoming an expert in 5 weeks", is that they aren’t giving you enough time to really sink in what you’re learning.
- Lastly, they generally tend not to be challenging enough. A lot of bootcamps and courses simply ask you to follow along and copy their code, which doesn’t require you to think critically and in-depth.
Why you should be working on personal data science projects.
Personal data science projects are a great way to learn because you’ll be forced to think critically about the problem and solution all on your own.
Through this, you’ll learn so much more than any bootcamp can teach you. You’ll learn how to ask the right questions, how to Google the right questions, how to approach a data science project that works for you, how to be methodical, etc…
By being more invested in your own project, you’ll also feel more motivated to learn more and invest more time, creating a positive feedback loop.
TLDR: Spend less time doing data science bootcamps and more time working on personal data science projects.
Need some ideas to get started? Check out my article, "14 Data Science Projects to do During Your 14 Day Quarantine".
3) Focus on a select few tools and be really good at them.
There are so many data science packages and tools out there, and that’s cool because you get to personalize your data science toolkit in a way that works for you.
However, it’s easy to get carried away in wanting to learn as many packages and tools as possible. Don’t make this mistake.
You’ll be much better off being extremely fluent in a few tools than scratching the surface with several tools that you’ve barely spent any time using. (Having a laundry list of skills and tools on your resume should not be the end goal!)
To give an example, there are several great data visualization packages out there: Matplotlib, Seaborn, Plotly, Bokeh, etc… There is no need to spend your time trying to master every single one of these – it’s a waste of your precious and limited time.
Another example, if you want to manipulate data with Pandas, be really good with Pandas. If you’re more of a NumPy type of guy, go for it. Yes, ideally you’d like to be good at Pandas and NumPy, but my point is that it’s probably a good idea to stick to one and master it, rather than constantly hopping around.
The same thing goes with…
- Python vs R
- Tensorflow vs Pytorch
- Postgresql vs MySQL
- the list goes on…
TLDR: Establish your data science tool kit and stick to it! Mastering 5 tools are better than barely knowing how to use 20 tools.
4) Understanding the various machine learning algorithms out there only makes up a small percentage of data science.
Personally, what got me into data science was all of the different machine learning models, how they worked, and what applications they were useful in. I probably spent at least six months learning and dabbling with several machine learning models, only to realize that it made it a fraction of what a data scientist needs to know.
Data modeling is only one part of the entire machine learning life cycle. There’s data collection, data preparation, model evaluation, model deployment, and model tuning that you need to have an understanding of as well. In fact, I would say that the majority of time is spent on data preparation, NOT data modeling (machine learning modeling).
On top of that, there are a several other things that you’ll have to learn, like version control (Git), pulling data from APIs, understanding the cloud, and the list goes on.
TLDR: Do not spend all of your time trying to master every machine learning algorithm. It only makes up a small percentage of what a data scientist needs to know.
5) As a Data Scientist, it’s common to feel Imposter Syndrome.
From the very first day when I started learning data science and to this very day, I experience Imposter Syndrome on a regular basis. But I learned that that’s completely normal.
Why is it common and okay for data scientists to feel imposter syndrome?
- "Data Science" is such a vague term, as it is an interdisciplinary field that includes statistics, programming, mathematics, business understanding, data engineering, etc. And on top of that, there are so many synonyms of a data scientist (data analyst, data engineer, research scientist, applied scientist). My point is that you’ll never be an expert at EVERYTHING that data science encompasses, and you shouldn’t feel like you have to be.
- Like everything else in programming and tech, data science is constantly evolving. 20 years ago, Pandas wasn’t even created. Tensorflow was only released 5 years ago. There’s always going to be new technologies coming out and therefore new things that you’ll have to learn.
- This kind of relates to my first point, but because you most likely won’t be an expert at EVERYTHING, that means there’s always going to be someone who’s better at the things that you spend less time on. And that’s okay too.
TLDR: As a data scientist, you will always feel imposter syndrome, and that’s okay.
Thanks for Reading!
Through reading this, I hope that I was able to give you some insights and useful advice that will help clear some of the misconceptions you have and also make your data science journey a lot smoother than mine!
I’ve received really good feedback for my more opinionated data science articles, which is why I wrote this. Like always, take this with a grain of salt if you disagree with anything that I said. But if you enjoyed it, please let me know what else you’d like me to write about.
I wish you guys the best in your data science journey as always!
Terence Shin
- If you enjoyed this, follow me on Medium for more
- Follow me on Kaggle for more content!
- Let’s connect on LinkedIn
- Interested in collaborating? Check out my [website](http://Want to collaborate?).
- Check out my free data science resource with new material every week!