The world’s leading publication for data science, AI, and ML professionals.

The different roles in the data ecosystem

I frequently get asked questions and see confusion online about the differences between different data related positions. Therefore I…

I frequently get asked questions and see confusion online about the differences between different data related positions. Therefore I decided to write a brief guide to the rolls and skills required for the different positions.

Positions

Data Engineer (analogous to big data software engineer )

Typical Education: B.A/B.S.

Common Tools: Spark, Flink, Hadoop, NoSQL

Languages: Java, Scala, Python

Where they are hired: Very large companies, mid-sized tech companies, and startups.

Required Skills: Distributed systems (important), data structures/algorithms (very important), databases (important), programming (very important)

Data engineers or big data software engineers generally setup, develop, and monitor the organization’s data infrastructure. They also integrate or productionize the models designed by data scientists. More specifically, data engineers setup pipelines that allow data scientists to easily experiment with data and create the production pipelines for services. For instance, data engineers might setup a data lake and a Spark cluster which data scientists then pull data from and submit data jobs too. Then if the Data Science team created a new model the data engineering team would optimize it and deploy it into production in conjunction with the engineering team.

Data Scientist

Typical Education M.S. or PhD

Common Tools: Scikit-learn, Pandas, Numpy, XGBoost

Languages: SQL, R, Python

Where are they hired: large/mid-sized organizations and tech startups

Skills: Statistics (important), databases (somewhat important), programming (important), linear algebra (somewhat important), business knowledge (somewhat important), distributed systems (somewhat important), feature extraction, data visualization

The definition of a data scientist can vary wildly between organizations. At some places a data scientist is closer to data engineer and at others they are closer to a research scientist. In general, data scientists attempt to answer business questions and provide possible solutions. Data scientists often begin with a vague question like "how do we increase user retention," figure out what data they need/how to collect it, analyze it, and then propose a solution. Data scientists frequently use machine learning techniques in their solution. For instance, in order to retain users data scientists might build a model that predicts which users are most likely to leave the site. Then use those predictions to target users likely to leave with a specific enticement to stay.

Unlike research scientists they generally don’t specialize in any one area of predictive modeling and instead will use whatever is the best tool for the job whether it’s trees, deep learning, or simple regression.

Data Analyst

Common Tools: Excel, Access, Tableau

Languages: SQL, VBA

Skils Required: Basic SQL/database knowledge, basic programming, Microsoft products.

Where are they hired: organizations of all sizes in all industries

Data analysts are similar to data scientists in their job goals, however they often have a more limited scope and tools. Data analysts generally generate basic reports/visualizations for specific problems and present that data. They generally do not do much predictive modeling or detailed statistics.

Research Scientist

Typical education: PhD

Common Tools: Caffe, Torch, Tensorflow, numpy

Languages: MATLAB, Python

Skills/Knowledge: linear algebra/calculus (very important), statistics (important), programming (somewhat important).

Where they are hired: large tech companies and data/ml startups

Research scientists usually specialize in a specific area like NLP or CV. As the name suggests they are most concerned with research and publication. They mainly work on finding new novel methods within their field and publishing the results. Although they may sometimes work on business problems their primary priority is research in their field of expertise.

Research Engineer

Typical education: B.S/M.S.

Languages: C, C++, Python, CUDA

eSkills/Knowledge: programming (very important),

Where they are hired: Very large tech companies, specialized data startups

A research engineer is to a research scientist as a data engineer is to data scientist. Research engineers tend to support research scientist in implementing by implementing and testing the algorithms developed by research scientists. They write code usually in C or C++ to create optimized computational platforms and implementations of M.L. algorithms. They are usually only found at very large companies like Google and Facebook.


Related Articles