The world’s leading publication for data science, AI, and ML professionals.

5 Questions to Ask Yourself Before Deciding to Become a Data Scientist

Get to know yourself before going down the data science career path

The Data Science Experience

Photo by National Cancer Institute on Unsplash
Photo by National Cancer Institute on Unsplash

Bzzz! My always-on-vibrate phone buzzes in my pocket as I walked along the Orchard Road in Singapore on a beautiful Saturday morning.

It buzzes a few more times in quick succession after the first one, but I thought nothing of it since I have tried to ignore messages on weekends.

I learned that disengaging from chats and social media will help you better appreciate your time with people around you.

Once I arrived back home, I was surprised to see it was from someone who had not messaged me for a very long time. I don’t really know this person, I have talked with him before, but not that much. An acquaintance, if you will.

We talked back and forth a bit, and he finally revealed his intentions after a couple of messages.

"I want to become a data scientist too."

This caught me by surprise, since I knew he was into competitive programming, not Data Science.

He followed the statement up with a lot of questions regarding which programming language to learn, which website is the best to learn about machine learning, and a lot of other stuffs.

It was exactly the same questions I had in mind back when I was still in university.

However, I figured he needs to know that being a data scientist is not limited to knowing certain programming languages or machine Learning models.

I ended up asking these 5 questions to him.

Are you willing to learn everyday?

Continuous learning. The one ability that every Machine Learning practitioner, such as data scientist and machine learning engineer, must possess.

Every area of knowledge has its own half-life.

The half-life of knowledge is the time it takes for half of the previous knowledge to be considered obsolete.

For areas such as mathematics and physics that have existed for thousands of years, their half-life periods are considerably longer compared to machine learning which is not even 100 years old.

Even compared to computer science knowledge, which is older than machine learning but also quite recent, the field of machine learning has been progressing very rapidly.

With around 100 ML research papers recorded in arXiv every day, it’s no wonder the machine learning is evolving rapidly and have a short half-life of knowledge.

Photo by Suzanne D. Williams on Unsplash
Photo by Suzanne D. Williams on Unsplash

I still remember back in 2016 when I started learning about machine learning, using Word2Vec for NLP was relevant, GAN was very bleeding edge that I could not find any working code to generate categorical data, and I had not even heard of Transformer model.

Now in 2020, we use language modelling or pre-trained BERT model for NLP problems, there are countless GAN variations that are able to do style transfers and generate realistic fake people faces, and Transformer has become the base of current state-of-the-art BERT models.

Data scientists must be able to adapt to these rapid changes throughout their career if they want to stay relevant. Fresh new talents from universities or even online courses loaded with new knowledge are ready to take over your place once you stopped learning new things.

To practice machine learning, one must become a learning machine.

Are you willing to learn about specific domains?

Still in line with the first question, but this is what separates data scientist and machine learning engineer.

Machine learning engineers mainly deal with getting ML models into production environment.

On the other hand, data scientists work with the data and need to understand the domain of the data they are working on.

Since most companies, especially startups, have their own definition of data science and machine learning engineer. Your experience will be different based on the company you work for.

I have personally worked with various data and build various ML products throughout my journey as a data scientist and machine learning engineer.

The first one was inventory management where I had to deal with restocking various ingredients of coffee vending machines and decide the optimal route for every restock trip by numerous drivers every day.

I did some research on cargo and container shipping when we had a client that requested ML solutions to help reduce their costs.

In my next job, I learned how to build and improve recommendation system for e-commerce platform and worked with big data and Hadoop for the first time.

By the time of writing, I work on building chatbots for financial services in SEA that are able to process local dialects.

Photo by bruce mars on Unsplash
Photo by bruce mars on Unsplash

None of the things I worked on were closely related to one another. Every time I got assigned to a new task, I would spend time on my weekends trying to understand the data on that task.

Even if you are a specialist, it might still have some variations. For example, you are a Computer Vision specialist, where your work is always related to image processing. The domain for face identification system will be different than working on a driverless car’s imaging system.

Another straightforward example of domain knowledge that a data scientist can have is language knowledge. If you understand non-English languages, it might prove to be useful when you are assigned projects related to Natural Language Processing in that specific language.

Everyone has a different background, use your background to your advantage.

Are you creative?

This question might sound weird at first, but let me explain.

In software engineering, the measure of correctness is clear. You are given an input and the expected behaviour and/or output. If your code can produce the expected outcome within certain time and memory limit, then you have completed the task successfully.

Data science is a bit more complicated than that.

The measure of correctness in data science could not be measured by 0 or 1. Most of the time, it will be a score between 0 and 1.

When you create a model, it is almost guaranteed that it will not achieve 100% correctness on any task. Depending on the task, your correctness will be defined by performance metrics such as F1 score, precision, recall, sensitivity, specificity, and accuracy.

Data scientists need to work on improving these scores as much as possible.

This is where your creativity takes part.

For example, when you are faced with imbalanced dataset problem, what should you do? Should you oversample or undersample? Is it possible to gather external data to enhance your model result? Could you use GAN to create more data with similar distribution of the minority class?

Missing data. Should you drop the row? Or even the whole column? Can you try to impute the data? Is it worth the effort to do imputation, or will it skew the model in the end?

Feature engineering. What feature can you extract from the data to enhance your model’s performance? Are there any highly correlated features, and which one should you keep?

The list goes on and on.

Before doing those things mentioned above, surely you need to have an understanding for them. However, once you are working at a company where nobody tells you step-by-step what you need to do with the data, your creativity comes to play.

Photo by James Pond on Unsplash
Photo by James Pond on Unsplash

At the very top of data science, you will find the research scientists. These are the people with PhDs and have published papers in high-tier conferences such as NeurIPS, ICML, CVPR. Chances are, you have used their pre-trained models, read one of their papers, or even taken one of their online courses.

These group of data scientists are the one who created older neural network architectures such as FFNN, RNN, LSTM, CNN, to optimisation techniques such as dropout, bi-directional models, highway network, all the way to newer architectures such as Transformers, GAN, and BERT.

These are the people responsible for our suffering of having to keep up with the advances in machine learning.

A research scientist is essentially venturing into the unknown of the machine learning world.

Although only a small percentage of data scientists will operate at this level, it goes to show how important being creative is in the world of data science and machine learning.

Are you a good storyteller?

Creating stories out of data is, in my opinion, the most important skill a data scientist should have.

You don’t need to be a good storyteller when working as a machine learning engineer, but you better be a damn good one when you are a data scientist.

People often describe a data scientist as someone who is dealing with the data, doing data preparation, feature engineering, model experimentation, and evaluation.

That statement is true, but it’s not the whole story.

Photo by Dylan Gillis on Unsplash
Photo by Dylan Gillis on Unsplash

At its core, data scientists are problem solvers

As a data scientist, you are the bridge between the data and the decision makers of a company.

Your work is not finished when you are done evaluating the result or when you have gained insights from the data. When you find new insights, you have to help other people understand what it is and why it happens.

Once you have successfully explained your new-found insights to other people, they will have a better and clearer understanding of the current issue. Ultimately, it will also help the decision makers take appropriate actions based on the insights.

Are you open to criticism?

"Why can’t your model handle this case?"

"Yes, the new model is better overall, but it is underperforming compared to the previous version in this specific area."

"This data doesn’t really makes sense, are you sure you’re doing it correctly?"

I had colleagues that would take these critics personally and react as if the person had just insulted his intelligence.

To be fair, you have to be ready to face criticism in any work you do, and this is more prominent when you are working with machine learning models.

What you need to understand is everyone is only doing their job.

When someone criticises your work, most of the time it is caused by them having a high expectation of you, not because they want to attack you personally.

You have to keep in mind that any critic you get is useful for deciding which areas to focus on in your work.

Photo by Erwann Letue on Unsplash
Photo by Erwann Letue on Unsplash

Being quick-tempered against criticism will not make you a lesser data scientist. However, you probably will have a harder time working in data science.

From my experience, having good storytelling skills will also help you face these criticism.

When you can explain how your model works and why it behaves that way, people will better understand what you have done. Especially if the people you are working with are business team which has limited knowledge of what you worked on.

Most of the time, they will ask for your opinion on how to fix this problem instead of continue to criticise your work once you have successfully explained the problem to them.

Final Thoughts

After a bit of more back and forth messages, I asked him why does he want to switch from software engineering to data science.

It turns out some of his co-workers have been learning to become data scientists by doing online courses and even taking Master’s degree.

He didn’t want to be left behind by the trend.

I asked him one last question before ending the conversation.

Do you still want to be a data scientist?

I waited for a moment before I finally got the reply.

A short one-word chat bubble appears from him.

I barely knew him, but one thing that I know is when there is a will, there is a way.

Finally, I bid him good luck and farewell.


The Data Science Experience is a series of stories about my personal experience and views from working as a data scientist and machine learning engineer. Follow me to get regular updates on new stories.

Rionaldi Chandraseta – Medium


Related Articles