The world’s leading publication for data science, AI, and ML professionals.

21 Tips for Every Data Scientist for 2021

#19. Learning how to set expectations will make a big difference in how "successful" you are in your career.

Photo by Clark Tibbs on Unsplash
Photo by Clark Tibbs on Unsplash

In this article, I’m going to share with you 21 pieces of advice that I’ve learned from other data scientists and through my own experiences over the past few years.

Depending on how far you are into your career, some of these tips will definitely speak to you more than others. For example, "Take some time to discover and explore new libraries and packages" might not be as relevant for someone who is just starting off.

With that said, let’s dive right into it!


1. The simplest solution is often the best solution.

Being a data scientist doesn’t mean that you have to solve every problem with a machine learning model. If a CASE WHEN query is enough to get the job done, stick with that. If linear regression is enough to get the job done, don’t build a 10-layer neural network.

There are many benefits to a simpler solution, including a faster time-to-implementation, less [technical debt](https://en.wikipedia.org/wiki/Technical_debt#:~:text=Technical%20debt%20(also%20known%20as,approach%20that%20would%20take%20longer.), and overall easier maintainability.


2. Intently set aside time to periodically discover and explore new libraries and packages.

It’s easy to stick with what you’re comfortable with, but new tools are created for a reason – they’re created to fill an existing gap with what’s already out there. By taking the time to explore new libraries and packages, I’ve found some incredible tools that have saved me lots of time. Here are a couple of them:

Image taken by Gradio (with permission)
Image taken by Gradio (with permission)
  • Gradio is a Python package that allows you to build and deploy a web app for your machine learning model in as little as three lines of code. It serves the same purpose as Streamlit or Flask, but I found it much faster and easier to get a model deployed.
  • Pandas Profiling is another package that automatically conducts exploratory data analysis and consolidates it into a report. I find this extremely useful to use when I’m working with smaller datasets. The best part is that it requires only one line of code!
  • Kedro is a development workflow tool that allows you to create portable ML pipelines. It applies software engineering best practices to your code, making it reproducible, modular, and well-documented.

3. Being efficient does not mean rushing important steps.

Some steps simply can’t be rushed. In particular, you should take time to develop a deep understanding of the business problem that you’re trying to solve and the data that you’re working with.

There are a number of questions that you should be able to answer before you actually dive into the model. You can check them out here.


4. Metrics are arguably more important than the model itself.

This point in a way is tied to the previous point in the sense that you have to have a really good understanding of the problem that you’re trying to solve. Along with understanding the problem is figuring out what metric you’re trying to optimize because, at the end of the day, machine learning is a fancy word for statistics and optimization.

To give an example, I can have a model with 100% accuracy, but this is useless if I’m trying to develop an anomaly detection model!


5. Your work will only be as good as your ability to communicate it.

People get intimidated by things that they don’t understand and tend to avoid them.

You have to be able to communicate technical jargon and modeling techniques in a manner that non-technical people can understand. If you took the time to build a great model, you should take a bit more time to communicate it effectively so that people can recognize your hard work!


6. Learn the fundamentals, especially statistics.

Data Science and machine learning are essentially a modern version of statistics. By learning statistics first, you’ll have a much easier time when it comes to learning machine learning concepts and algorithms.

I created a complete 52-week curriculum with the first six weeks dedicated to statistics which you can check out here.


7. Know your parameters of the problem you’re solving.

This can be best explained with an example.

For one of my projects, I had to develop a model to predict whether a product had to be RMA’ed or not. Initially, I thought that my input was ALL products which made it almost like an anomaly detection problem.

Only after understanding the business needs and how the model would be used did I realize that the input of my model was all products that were issued an RMA (customer sent an email about a problem with the product). This made the data much more balanced and saved me a lot of time.


8. Don’t underestimate the power of SQL.

SQL is the universal data language – it is arguably the most important skill to learn across any type of data-related profession, whether you’re a data scientist, data engineer, data analyst, business analyst, the list goes on.

Not only is SQL important for building pipelines, pulling data, and wrangling data, but you can now actually create Machine Learning models using SQL queries. BigQuery ML allows you to do exactly that.


9. Treat data science like a team sport.

One of the biggest perks of being a data scientist is the amount of autonomy you’re given. But this can easily be a downfall if you’re not willing to seek advice, help, and feedback from others.

Despite the level of autonomy, data science is a team sport. You have to embrace advice and feedback from several stakeholders, including end-users, domain experts, data engineers, etc.


10. Don’t waste your time trying to memorize everything.

There’s simply too much out there to try to memorize everything. Plus it’s a big waste of time. You’re better off practicing how to Google your questions so that you can get the answers that you need.

Also, start a Google sheet to keep really useful links that you find yourself commonly going back to. For me, I like to include links to cheat sheets, crash courses, and questions that I tend to Google a lot (eg. regex code for emails).


11. Deploy fast, iterate fast, and continually get feedback.

It’s important to constantly be communicating with other stakeholders, keeping them in the loop with your thought-process, any assumptions that you make for the model, and getting feedback. Otherwise, you may end up with a model that doesn’t solve the problem at hand.

Personally, I use Gradio to create web UIs for each iteration of my model when sharing it with stakeholders, especially non-coders.

I find Gradio incredibly useful for the following reasons:

  • It allows me to interactively test different inputs into the model.
  • It allows me to get feedback from domain users and domain experts (who may be non-coders)
  • It takes 3 lines of code to implement and it can be easily distributed via a public link.

12. See a project the full way through. You are equally as responsible for implementing a model as you are creating it.

Long gone are the days where you as a data scientist can hand over your scrappy Jupyter Notebook to the engineering team for implementation. These days, data scientists are more like data scientists slash engineers slash product managers.


13. Everything is a sales pitch.

As a data scientist, you’re always selling yourself, whether it’s selling a new idea or selling a model that you’ve built. Similar to point #5, you have to be able to communicate the business value that comes from every idea, every model, and every project that you undertake.


14. Build a sustainable schedule to learn consistently.

If you’re going to learn, do it the right way. You might have heard of the curve of forgetting. Simply put, you need to be consistent in learning data science and practicing what you learn if you want to be able to retain new information.

Be honest with yourself and make a schedule that you can adhere to. But consistency is key.


15. Learn how to use Git and GitHub.

Learning software engineering best practices will go a long way. Version control is especially one of the most important practices because every company uses it!

I’d check out these two resources:


16. Learn by Doing.

You’ll learn and retain more knowledge and skills by doing rather than just studying. Similar to how you do homework after you learn a new concept in school, you need to constantly apply what you learn to projects.

Here’s some project ideas to get you started.


17. Stay in touch with what’s going on.

Related to the point of exploring new tools and libraries, it’s important to keep up with what’s new in data science so that you can keep your skills and tools as up to date as possible.

I like to do this by reading publications, watching YouTube videos, and reading company blogs, like Airbnb, Uber, Google, and Facebook.


18. Learn to apply divergent and convergent thinking.

This is an incredibly useful technique to use in data science so that you can make sure that you’ve exhausted all options. Divergent thinking simply means to explore multiple solutions to a given problem and convergent thinking means to narrow your options to one solution. This is particularly useful when performing EDA and choosing a model/algorithm to use.

You can learn more about it here.


19. Start a career doc.

This is something that I actually didn’t hear about until my friend, Udara, wrote about it. It’s essentially a journal or a diary for your career. Unlike a resume, which is for employers, a career doc is for you to look back and reflect on.

If you want to learn more about it, you can check it out here!


20. Learning how to set expectations will make a big difference in how "successful" you are in your career.

Promise less. Deliver more.

This is particularly relevant for data scientists because a data scientist can spend as little or as much time as he/she wants to create a model. A data scientist can build a mediocre model using auto ML libraries or build a near-perfect model but take months to complete it.

Regardless of what you choose, it’s important that you manage expectations so that stakeholders are not disappointed. Particularly, this means managing expectations in terms of timelines and the performance of the models.


21. Find a mentor who’s willing to help you that you look up to.

One of the greatest things that have happened to me in my career was finding a mentor who was extremely knowledgeable who also deeply cared about my success.

I would argue that because of him, I’ve been learned twice as much as normally.


Thanks for Reading!

I hope that you were able to take a thing or two away from this! I truly believe that these pieces of advice have significantly helped me in my career and I’m sure that it will do the same for you.

As always, I wish you the best in your learning endeavors 🙂

Not sure what to read next? I’ve picked another article for you:

All Machine Learning Algorithms You Should Know in 2021

and another one!

A Complete 52 Week Curriculum to Become a Data Scientist in 2021

Terence Shin


Related Articles