The world’s leading publication for data science, AI, and ML professionals.

My Unbelievable Move From Data Engineer to Data Scientist Without Any Prior Experience

By mastering one basic skill

Photo by PhotoMIX Company from Pexels
Photo by PhotoMIX Company from Pexels

The year was 2013, one year after Harvard Business Review named data scientist as "the sexist job of the 21st century". I had been working as a consultant doing Data Engineering work for the data science team at a major retail company. A consultant’s life wasn’t how I had envisioned when I had accepted the job and I decided it was time to move on. At the same time, the data science team was hiring to backfill a position for someone that had left a year before. One day I was talking to a data scientist on the team about my job search and the conversation went along these lines:

Data Scientist: It’s so hard to find a good candidate for this opening. I’ve interviewed so many people and none were a good fit.

Me: Maybe I should apply for the data scientist opening. ( jokingly )

Data Scientist: If you’re interested I will convince our VP of analytics to hire you.

Me: Seriously? I have no data science experience. Why would you want to hire me?

Data Scientist: Because you know the data.

"Because I knew the data" I was able to leverage this one basic skill I mastered while being a data engineer to become a data scientist without any prior experience. These are the key steps you need to follow to become a master at this basic skill.


1. Don’t just process the data. QA the data.

When you get a raw data file, is your first instinct to look at the file delimiter, figure out the field types, and load it into the database without actually looking at the data? Do you then email the data scientist and tell them the data was loaded without error and then move on to the next task? This is processing the data.

What does QA ( quality assurance ) entail? This is the process of checking for data irregularities such as duplicates of primary keys, unexpected missing values, missing dates if the file contains historical data, and so forth. Learning how to run basic QA checks on data will develop your ability to spot data issues upstream before it ever makes it to the data scientist. This demonstrates an ability that data scientists spend 80% of their time doing – data preparation.

If you don’t know what to QA checks to run, ask the Data Science team what they typically check for when they look at new data. Also note down issues the data scientist mentioned after working with the data and check next time you load similar data. You may not be able to find all the issues but the data scientists will appreciate the time you spent checking for obvious issues before handing it over to them because it frees up their time to build models.

2. Learn what data is available and how it relates to each other.

As a data engineer in a retail company, I processed a large variety of data such as customer information, website activity, and purchases. One key piece of information missing from the website activity was a link to the customer if they had browsed products prior to purchasing without logging into the website.

By leveraging my knowledge of the data and how it related to each other I was able to link the website activity using different identification fields such as email address and cookie ID to the customer ID. This data proved invaluable to the data science team when they built purchase propensity models because prior web browsing was a leading indicator of purchase.

3. Understand the needs of your stakeholders.

When the data science team requested a data mart to be developed for model development I asked them how they needed the data structured. They told me every feature for a customer had to be in one row which meant building a very wide table with hundreds of columns. This was an unmanageable table structure for a data engineer to support.

Instead of creating a wide table, I designed a table structure that required a limited amount of columns and developed functions the data scientists could call to aggregate and transpose the data for modeling. By working with the data and understanding how best to aggregate it for model development, I was able to build a table structure that was scalable for data engineers to support but allowed new features to be added quickly. The development of the data science data mart would later become the primary argument to hire me as a data scientist without any prior experience.


"Knowing the data" is a fundamental skill that is crucial to success for any data related job whether it is a data engineer, data scientist, or data analyst. As a data engineer I had already demonstrated 80% of the data preparation skills needed for a data scientist. While my experience is an outlier, I hope my story provides inspiration for you that mastering a basic skill can be the key to unlocking your career into data science.


You might also like…

My Experience as a Data Scientist vs. a Data Analyst

6 Best Practices I Learned as a Data Engineer

How Data Scientists Can Troubleshoot ETL Issues Like a Data Engineer


Related Articles