Machine Learning sounds appealing and charming. In fact, it plays a key role in what makes the data scientist the sexiest job of the 21st century. However, if you are not aiming for creating your own algorithm, you should not focus too much on machine learning especially at the beginning of your career.
I’m definitely not arguing that machine learning is trivial or unnecessary. It is of crucial importance in many domains and tasks. However, you can achieve satisfying results by implementing ready-to-use models in a few lines code.
If you follow Kaggle competitions, you will see how a small fraction of improvement in the log loss or any other metric results in a big jump on the leaderboard. Well, that’s not the case in real life.
Once you have the data in a clean and appropriate format, the variation in the cost function with regards to the selected model or parameters is usually within an acceptable range.
In other words, the impact of the model and parameters on the cost function is small compared to the data and features. Thus, data cleaning or wrangling and feature engineering is much more important than the model selection and parameter tuning.
What I would suggest to the aspiring data scientists is not to focus too much on machine learning algorithms and hyperparameter tuning unless they want to become machine learning researcher.
In the following part of the article, I will provide my suggestions as to what aspiring data scientists should focus and spend time on instead.

The first and foremost skill to acquire is SQL. Although NoSQL databases are getting more popular, a substantial amount of companies are still using SQL and I think it will remain the same for a long time.
As a data scientist, you should be able to retrieve the data you need from a relational database. You would not want to depend on data engineers or SQL professionals to acquire the data. Furthermore, it is likely that the company you work for does not have dedicated people for providing you with the data.
SQL is not only used for retrieving data but also as an efficient data analysis tool. The versatile and flexible functions allows for writing advanced queries to retrieve only the desired data from a database. Furthermore, we can perform data transformation and analysis while retrieving the data. Thus, it will be very helpful to have advance SQL skills.
Another highly important skills is data cleaning, manipulation, and translation. A more general term used for such operations is data wrangling. You need to be able to play with the raw data easily and smoothly.
Real life data is usually messy and not in the most appropriate format for analysis and modeling. There are many software libraries for data wrangling such as Pandas for Python and Tidyverse for R.
You should master at least one tool for data wrangling. It is better to have Pandas and Tidyverse in your skillset as most of the companies are using Python or R.
These libraries provide numerous functions for data manipulation and transformation. Thus, they are very helpful in the process of deriving new features based on the existing ones.
As a data scientist, you are likely to write production level code or collaborate with software engineers. Thus, Git is a must-have skill for data scientists. You should at least be comfortable working with the basic git commands.
More and more companies are adapting cloud-based strategies for data storing and processing. Thus, you should be familiar with cloud computing as well. You do not need to have the skills of a cloud architect. However, you should at least be able to access and retrieve data from the cloud.
Conclusion
I would like to emphasize one more time that I do not mean machine learning is trivial. In fact, it is of great importance in the process of creating value out of data.
What I want to point out is that the ready-to-use machine learning algorithms and solutions do a fine job for most cases. A basic understanding of machine learning algorithms along with their pros and cons is usually enough unless you want to be a machine learning researcher.
Most of your time will be spend on cleaning, transforming, manipulation, and understanding the data. Thus, the tools that expedite and ease these processes are more important than mastering machine learning algorithms.
Thank you for reading. Please let me know if you have any feedback.