The world’s leading publication for data science, AI, and ML professionals.

7 Traits of Incredibly Efficient Data Scientists

To become an effective data scientist, you must first become efficient.

Office Hours

Photo by Boitumelo Phetla on Unsplash
Photo by Boitumelo Phetla on Unsplash

A regular complaint about data science is that many of the daily tasks that come with the job are repetitive, redundant, and time-consuming.

This busy-work is often cited among the many reasons why data scientists are unhappy with their jobs, many of whom thought they would always be working on exciting projects and completing company-vital analyses.

However, when reality sets in, the realization hits that a lot of the work can be monotonous and not as exciting as solving big real-world problems.

Many people talk about the importance of being an effective data scientist, though few discuss how people can become efficient data scientists. Efficiency is one of the pillars of effectiveness, and with efficiency will come improved job satisfaction as those boring tasks become a thing of the past (or at least a thing of the automated future).


1. They automate repetitive tasks.

Data science is built on repetitive tasks, including the fundamentals of obtaining, preparing, and cleaning data. It’s a common rule of thumb that data scientists spend 80% of their time on these tasks.

It’s unfortunate that such repetitive and often mind-numbing tasks take up so much time, especially when it’s the fun things like data analysis, visualization, and modeling that got people into data science in the first place.

While obtaining, preparing, and cleaning data is vital to the success of a project, they aren’t activities that yield a specific ROI – they’re just tasks to get to the ROI at the end. Doesn’t it make sense to spend the majority of your time on tasks that return a definite ROI and use automation to handle the other tasks?

Automating the low ROI tasks that take up the majority of your time opens you up to be more efficient and spend less time on routine tasks. While there are going to be particulars concerning data collection and cleaning for each project, it’s still possible to automate the process once your requirements are established.

How to automate data collection:

Four Basic Ways to Automate Data Extraction

How to automate data cleaning:

Data cleaning is finally being automated

Automate Data Cleaning with Unsupervised Learning

Take the Pain Out of Data Cleaning for Machine Learning

How to automate code checks:

The Lazy Mindset of Effective Data Scientists: How Automation Can Help

2. They use the simplest tool for the job.

Believe it or not, not every data analysis requires machine learning and Artificial Intelligence.

The most efficient way to solve a problem is to use the simplest tool possible.

Sometimes, a simple Excel spreadsheet can yield the same result as a big fancy algorithm using deep learning.

By choosing the right algorithms and tools from the start, a Data Science project becomes much more efficient. While it’s cool to impress everyone with a super complex tool, it doesn’t make sense in the long run when less time could be spent using a more simple, efficient solution.

Here are some resources to help you choose the best tools and algorithms for your next data science project:

10 Tools I Use to Streamline My Data Science Learning Experience

How to Choose the Right Machine Learning Algorithm for Your Application

3. They follow a strict coding structure.

Doing the job right the first time is the most efficient way to complete any project.

When it comes to data science, that means writing code using a strict structure that makes it easy to go back and review, debug, change, and even make your code production-ready.

Clear syntax guidelines make it possible for everyone to understand everyone else’s code. However, syntax guidelines aren’t just there so you can understand someone else’s chicken scratch – they’re also there so you can focus on writing the cleanest, most efficient code possible.

Few things are worse than having coded an entire project and then having to go back and refactor all of your code so that it follows company guidelines.

Save yourself the time by becoming familiar with best coding practices, best software engineering practices, and the specific syntax guidelines and requirements from your company.

Here are some resources to help you follow best practices:

Software Engineering Best Practices for Data Scientists

Object-Oriented Programming for Data Scientists

4. They have a team around them that can help them solve problems.

Nobody is an ocean, and data scientists are no exception.

While it can be exhilarating and incredibly satisfying to solve a difficult problem, it may not be in your best interest if it took you more hours than necessary.

I can remember a few specific times where I was stuck and refused to ask for help. After hours (and admittedly sometimes days) of banging my head against a wall trying to solve the problem, I would give in and ask someone, only to find out that their solution which they gave me in less than five minutes would take only seconds to implement.

In other words, the most efficient data scientists aren’t afraid to ask for help, and furthermore have a team surrounding them full of people who are able to give them the answers they need. A poor team environment is one of the reasons why so many data scientists are leaving their jobs, which further highlights the importance of having a good team surrounding you.

Here are some resources to help you get started with building better data science teams:

Why Data Scientists Need To Work In Groups – KDnuggets

What is the most effective way to structure a data science team?

5. They set aside time to learn new things and to better themselves.

Self-improvement is one of the cornerstones of efficiency. Without improvement, there can be no increase in efficiency.

The most efficient data scientists make time to learn new things and better themselves.

Whether it’s completing a weekly literature review, or setting aside a few hours per week to work through a MOOC, data scientists become more efficient as they grow their knowledge base.

Data science is a quickly changing landscape with new languages popping up every year, a constant slew of academic papers being released on new techniques, and an ever-growing community coming together to share new insights into how to do things differently.

The only way to stay on top of everything and remain relevant is to set aside dedicated learning time.

You can begin by scheduling a few hours of dedicated learning time every week and set a task for yourself for each session. For example, for one hour on Wednesday, you read and review a new paper on Machine Learning. During one hour on Friday, you’ll practice implementing generative adversarial networks (GANs). Finally, on Sunday, you work on a personal project or complete coding challenged on Hacker Rank or Kaggle.

By setting aside small chunks of time with dedicated tasks, it’ll become easier to develop a weekly self-improvement habit.

Here are some resources to get you started with literature reviews, personal projects, and MOOCs.

How Reading Papers Helps You Be a More Effective Data Scientist

The 7 Data Science Projects I Plan on Completing in 2021

Top 20 free Data Science, ML and AI MOOCs on the Internet

6. They use the best visualization for the job to avoid skewing data.

As I mentioned earlier, the quickest path to efficiency is to do the job right the first time.

Not only does that mean coding using a proper syntax structure, but it also means using the right visualizations from the get-go to ensure that you’re not skewing the data accidentally.

Choosing the right visualization isn’t just important for data integrity, it’s also important for understanding what the data is telling you. Since data visualization is often one of the first tasks done before any modeling or analysis, it’s important to get it right the first time so you know what you’re dealing with.

Here are some resources to help you understand different types of data visualizations and when to use them:

Data Visualization 101: How to Choose a Chart Type

Data Visualization: How to choose the right chart (Part 1)

7. They make a plan of attack before they write any code or clean any data.

One of the best tips I ever learned when I was studying software development in university is to always make a plan before writing any code.

Whether it’s a flow chart, a step-by-step thought process, pseudocode, or a checklist, having a plan of attack before doing anything data science-related is crucial to the success and efficiency of the project.

I can’t remember how many times I was coding away (without a plan) and then someone would come and interrupt me while I was in the middle of constructing some complex logic. Once I got back to work, I realized that I had lost my train of thought and had no idea where I needed to go from there. Had I had a plan, I would have been able to pick back up where I had left off.

Plans aren’t just useful for maintaining your thought process – they’re also vital for figuring out issues when things go wrong. Having a plan helps you refer back to the steps you took to get somewhere which is useful when it comes to identifying bugs or areas where your logic didn’t quite pan out.

Having plans will allow you to code more quickly and efficiently, and will keep you from wasting precious time trying to jump back on your train of thought when it inevitably leaves the station without you.

Here are some resources to help you write better flowcharts and pseudocode to improve your coding efficiency:

How Do You Flowchart Code?

Pseudocode 101: An Introduction to Writing Good Pseudocode

Bonus: 8. They optimize their daily workflow.

Data science is one of those professions where you’re expected to do a lot of different tasks perfectly on any given day. However, most of these tasks require different skillsets, which can make it difficult to jump from task to task. Furthermore, this often involves a complete mental shift when you’re working on different aspects of different projects you have on the go.

The solution to this is to optimize your daily workflow for increased efficiency and productivity. No, this doesn’t mean multitasking. This means using systems such as task-batching, time blocking, or day theming, where you work for a given period of time on the same type of tasks and only those types of tasks.

Simply put, batching your daily tasks keeps you from having to do the necessary mental gymnastics to switch your thought process to fit completely different tasks.

There are a couple of different ways that you can batch your tasks:

  1. Batch tasks by type (example: from 9 am to 12 pm, you complete data cleaning for all of your projects)
  2. Batch tasks by focus (example: from 9 am to 12 pm, you complete tasks related to project X)

The first type of example allows you to stay in the data cleaning mindset by doing all of your data cleaning for all of your projects. The second type of example allows you to completely focus on project X, so you do all of your tasks for the day related to project X. Depending on how you work best, you may find that focusing on one particular type of task or one particular project for a time period will be most efficient for you.

Here is a great resource that gives further insight into time blocking, task batching, and day theming:

The Complete Guide to Time Blocking


Final thoughts.

In the uber-competitive data science field, data scientists are judged by their ability to positively impact a company. That positive impact comes with increased efficiency, which in turn leads to effectiveness.

By implementing the above traits of efficient data scientists, you’ll become a data scientist who can bring that vital level of impact that companies are looking for.


Related Articles