The world’s leading publication for data science, AI, and ML professionals.

What Makes a Good Data Scientist

…or what good data scientists have in common.

Photo by Leon on Unsplash
Image credits: You X Ventures – Unsplash

What I plan to write in this article is built around my experience of working with very good data scientists. I do not claim that I’m one of them as of today. However, I keep working and studying to become one.

I’m not in a position to declare or evaluate a data scientist as good or not good. The following words will demonstrate my observations of the common practices and skills of well-performing data scientists.

In this sense, the title of the article might be "What Good Data Scientists Have in Common".

Learning from others is a highly valuable skill. The first step for improving this skill is to become a good observer. You should focus on what others do and how they do it.


They go for the simple

My first and foremost observation is that experienced data scientists go for the simple. They even sacrifice a small amount of accuracy for simplicity.

In most cases, simpler is better. If you have a simple solution that accomplishes a particular task, go for it. There is no need for complex models or solutions unless it is absolutely necessary.

Complex solutions are prone to making mistakes. Furthermore, it is harder to debug them compared to simpler ones.


Think first

When I had a problem with a model or script, my first action would be playing around with the code. I was hoping to solve the issue by changing the code.

This approach usually leads me towards the solution. However, it is definitely not the optimal way of solving problems. In some cases, I waste a substantial amount of time with this approach.

What I see from experienced data scientists is that they think about the problem and possible solutions before taking any action. At first, it seemed to me a waste of time because you could not solve a problem just by thinking about the solution.

However, it did not take me long to realize that an intuitive approach proves to me much more efficient. A comprehensive understanding of the problem and laying down solution options is always a better way to start.


Features vs algorithms

When I first learned about Data Science, I spent a great deal of time mastering machine learning algorithms. Both the theory behind an algorithm and its parameters seemed to be of crucial importance to me.

There is nothing wrong with learning the algorithms. In fact, data scientists need to know when and how to apply a particular one. However, unless you are a researcher, mastering each and every algorithm is a bit of overdoing.

What I learn from experienced data scientists is that a decent algorithm with proper settings will do the job. It is not necessary to spend an extraordinary amount of time tuning an algorithm or trying out different ones.

The features definitely have more impact. Thus, it is much more efficient to spend your time on feature engineering. One informative feature has the potential to make a difference in terms of accuracy and model performance.


A clear explanation is worth a fortune

Like many other professions, data science is a teamwork. You are likely to work with not only other data scientists but also people from other professions on a data science project.

The most significant requirement to become successful as a team is to establish clear communication among team members.

To achieve such a communication, it is of crucial importance to be able to explain things clearly and concisely. It is one thing to perform a task but it is another level to explain it. Unless you work on your own, what you do is useless if you fail to explain it to the other team members.

The experienced data scientists I work with are great at explaining things simply. They are always to the point. They avoid making things more complex than necessary which makes the team work even more robust and efficient.


Excel is still a key player

Excel is a ubiquitous tool. Although it might seem like old school, excel is still highly useful and practical in the data science ecosystem.

Before I started to work as a data scientist, I wouldn’t think that I’d be using excel at all. I thought it was not capable enough to perform data science tasks.

My biased view of excel changed when I saw how senior data scientists make use of it. The biggest advantage of using excel is the practicality it provides. It is such a great tool to perform quick analysis.

Its scalability is always a concern. However, when it comes to doing quick analysis with small size data, excel outperforms many popular tools.


R is charming

This might well be a coincidence but the senior data scientists I work with have a passion for R. They prefer R especially for exploratory data analysis tasks.

It does not mean that R is superior to Python. What you can do with R can also be done with Python. However, when it comes to the performance of data analysis and manipulation packages, I feel like R beats Python.

For instance, I have experienced that the R data table package outperforms Python pandas on operations with a csv file of 2 GB size.

If you haven’t already done so, I suggest at least trying the data table package for data analysis and manipulation. I have always been a big fan of Pandas but I prefer the data table over Pandas.


Conclusion

I would like to emphasize again that I’m not in a position to evaluate data scientists, good or not good. What I shared in this article is based on my experience of working with experienced data scientists.

I take them as an example and set my goals to achieve what makes them good at their jobs.

Thank you for reading. Please let me know if you have any feedback.


Related Articles