![A software engineer putting on a data science glasses. - Image generated with FLUX.1 [schnell] by the author.](https://towardsdatascience.com/wp-content/uploads/2025/01/1Bu8Vj5W6OQ-peVWsSOU8kQ.jpeg)
Mentoring teaches me a lot.
I recently had the opportunity to guide a friend who worked as a software engineer (SE) for two years and wants to transition to a Data Science (DS) role. What started as a casual chat eventually became several hours of outlining plans to become a data scientist.
Her first question was, "What should I learn new?"
Of course, I could list a dozen things in a minute, but it requires much more than a list of skills and links to popular courses. FYI, I never answered this question, neither in this post.
Some of her existing skills are invaluable to her new endeavor. She could fast-track her transition by carefully learning one. But what’s more important is thinking like a data scientist.
She doesn’t have to unlearn anything. However, some of her SE skills have little use in data science.
In this post, I summarize some of our discussions. These include which area of data science suits her interests, which new skills she needs to acquire, and how to start small and grow faster.
The Most Valuable LLM Dev Skill is Easy to Learn, But Costly to Practice.
How do data scientists think differently from software engineers?
This is not to say that one’s work is easier than the others. However, a DS has significantly different goals regarding responsibilities than an SE.
SEs care about designing, developing, and maintaining software. Mostly, SE’s work is more deterministic. In other words, they know the outcome they are building for. And there’s often a finite set of techniques to achieve them.
SEs will have to make many choices in their work and sometimes will have to do course corrections. But these uncertainties will often have predictable solutions.
On the other hand, a data scientist’s work is filled with more unknowns. There’s no guarantee that the data they have is of good quality. DS will have to work on them. No one can tell which ML model would work best for the problem at hand; it’s more or less trial and error. Which evaluation metric is more important than the others? And what’s an acceptable threshold for them? These are all questions a data scientist will have to figure out on the go.
Due to these reasons, a DS has a more experimental mindset. As a professional SE, you’d often work in a framework like SCRUM, which expects you to finish the work in a strict timeframe called a sprint.
Data scientists are one kind of data science professionals.
Although we interchange the name, not all data science professionals are data scientists. We’ve got several other roles that go along with data scientists.
We rarely see a role that says full stack data scientist, though the full stack software developer role is very common.
Most people employed as data scientists are either analysts, machine learning engineers, or data engineers.
The most common role in data science is analyst. Analysts play a key role in business decision-making. They extract key insights from the data they have and educate management. Analysts don’t have to be programmers, either. Most of the analysts I know work with only Excel.
Machine learning engineers’ core responsibility is to train models that solve business problems. They work on data preprocessing, model selection, and training. As Andrew Ng points out, data scientists in industrial setups (or that closely resonate with ML engineers) don’t try to invent new models. Instead, they try to find the best data and preprocessing techniques to solve problems.
On the other hand, data engineers create and maintain data and infrastructure. This involves creating data pipelines and warehouses and managing databases.
Data engineers and ML engineers often work hand in hand. On some tasks, like model deployment, they both work together.
Those wanting to be data scientists often need to pick one. Each has its challenges and tools to solve them. For instance, while ML engineers concentrate on tools like Scikit-learn, Tensorflow, and Pandas, data engineers focus on tools like Airflow, SQL, and cloud infrastructure management.
It is now clear that the technologies used by these different data scientists are fairly different and require an investment of your time to master them. For this reason, we rarely see a role that says full stack data scientist, though the full stack software developer role is very common.
How to pick the correct data science role for you
I already mentioned this. I don’t recommend that anyone try to master all the different data science roles, although my own career was kind of a kitchen sink.
The best thing is to choose a role that suits your personality and pick it as early as possible.
Here’s my guide (of course, opinionated.)
If you’re a person who likes less programming and less technical work, you focus more on aiding business decision-making. Still, you’d be happy as a data analyst if you want to be in a data-related role.
Knowing some data-wrangling techniques in Python is helpful but not a must. I’ve had colleagues who continuously challenged me that they could do the stuff in Excel, for which I’m using Python. Guess what, sometimes they win.
However, you should discuss with your HR how your performance is measured because the insights you take from data don’t have any tangible value. They need to be acted upon, and even then, there’d be a delay in seeing their success. Thus, the traditional evaluation technique won’t apply to you.
Are you a person who doesn’t care about numbers, insights, or models? Do you consider yourself more like a software engineer than a data scientist? But do you want to be a data scientist anyway? Consider the data engineer role.
You’d still do a lot of coding and work with databases.
Lastly, if you like to train models, evaluate them, perform data preprocessing, build pipelines, etc., an ML engineer role is better for you than the other. In an industrial setup, ML engineers don’t create new models. Instead, they use data and model selection to find the best solutions to the requirements.
Skills you’ll rarely use as a data scientist (If you were a software engineer before)
As a former software engineer and a data scientist, I know a few things that are of little use today. Here are a few.
Design patterns and principles
Software design patterns distinguish skilled SEs from amateurs. These best practices allow us to develop software that is easy to maintain, scale, and reuse components. Since other developers widely recognize patterns, it helps a new person understand and collaborate with you quickly.
Design principles guide better coding and help maintain the codebase. I strictly follow SOLID design principles whenever I develop software.
Likewise, design patterns are like templates you can borrow to solve problems without wasting time.
I couldn’t say these are completely useless. If you’ve used libraries like Pandas, scikit-learn (or literally anything), they use these patterns and principles.
But as a person who mainly cares about getting insights from a dataset you just received or developing pipelines that execute a set of instructions one -after another after another, you’d rarely need them.
Web development frameworks (like Django)
I was a Django developer. I loved doing it.
You might ask, "What if a data scientist wants to package their work in a web app? Don’t they need Django?"
They certainly did in the past.
However, today, the need for data scientists has been replaced by tools like Streemlit or BI tools like Tableau and PowerBI. You don’t have to program everything yourself.
6 Python GUI Frameworks to Create Desktop, Web, and Even Mobile Apps.
I recently used Django to create a workaround for Streamlit’s authentication issues. I also documented the methods in two previous posts (Authentication and authorization). Streamlit at that time didn’t have these features. But today, this is an outdated advice.
Agile project management (such as scrum)
During the transition, this was my biggest challenge. As software engineers who want to put features out every other week, a system like SCRUM is super helpful.
But this wasn’t easy when working with data – especially when the client owns the data.
Most of the time, the work we do is experimental. There’s no guarantee that you’d find the perfect ML model within the timespan of a sprint. Every time you see new data, you see new challenges that make time estimations tricky.
Even an analyst couldn’t say in advance that they will extract some number or insights in 2 weeks.
I remember what a manager said in my early days as a data analyst. "Torture the data until it starts screaming insight." And you don’t know when that’ll happen.
I then asked seasoned data scientists and SCRUM experts how to implement these principles in data science projects. They didn’t know.
New skills you may have to learn as a data scientist
Finally, we’re now ready to discuss the purpose of this post – what skills a software engineer wanting to be a data scientist must learn.
This is what my unofficial mentee and I agreed at last.
I’m a believer in the 80/20 rule. This means there’s a small subset of skills (20%) that makes the most impact in your career (80%)
Your job is to self-assess your relevant skills and discover the 20% of new skills you need to learn to take you to your future self.
My friend, an SE and Django developer, already knew Python well. She was also good at SQL. Python and SQL alone would make her a fine data analyst.
However, she thinks machine learning is cooler than extracting insights from datasets. This raises more questions: Should we use traditional ML or Deep learning models? Should we use model fitting, computer vision, or NLP?
Here’s what I suggested.
Regardless of anything, all data scientists use Pandas to some extent. You could get started on it in a couple of hours (just like any other library). But to be a data scientist, you should have some proficiency in Pandas. Therefore, that’s the first.
I didn’t suggest any courses or resources to learn about Pandas; she should be able to find some good ones by googling them for a few minutes.
Next scikit-learn. Again, a lot of data scientists use them. Even if you’re working on deep learning projects, you’d still be using some of the modules in scikit-learn. It’s worth learning.
Scikit Learn isn’t simply a library to produce ML models. You could do data preprocessing, model selection, etc.
Then, a little bit of NLP. I suggested the library TextBlob. It’s a wrapper around the NLTK library, the holy grail of the NLP library in Python.
Finally, I’ve asked her to master either Plotly or Streamlit. Streamlit is easier to learn, and many other data scientists use it. However, with her experience in SE, something like Plotly would still be within easy reach.
Python, SQL, Pandas, Scikit-learn, Steamlit, and TextBlog are good skills to consider as the top 20 percent that have an 80% impact on her career.
Anyone could argue that these aren’t sufficient for data scientists in today’s competitive market. I agree. But with this, she could figure out where to go next.
She’d choose tools like OpenCV and PyTourch if it’s computer vision. If she advanced in NLP, she’d choose libraries like SpaCy.
Final thoughts
A lot of us want a career change. Few get their dream job as their first job.
In this post, I’ve summarized what I’ve discussed with a friend who is an SE but wants to be a data scientist.
We figured out she wanted to be an ML engineer more than an analyst or a data engineer. Since she already knew Python and SQL, picking up what she needed to learn was easy.
We discovered that anyone wanting to be an ML engineer can focus on Python, SQL, Pandas, Scikit-learn, Steamlit, and TextBlog, as these are top skills that make most of your life as a data scientist.
Thanks for reading, friend! Besides Medium, I’m on LinkedIn and X, too!