The world’s leading publication for data science, AI, and ML professionals.

Matrix of Roles for Data Professionals

Adding some structure to the sea of data roles available

Photo by Markus Spiske on Unsplash
Photo by Markus Spiske on Unsplash

During my recent job hunt I realised that there are lots of blogs out there highlighting the differences between Data skillsets (Analyst, Data Scientist, Machine Learning Researcher etc). However, I didn’t come across many that explore how these skillsets tie in with different business functions of a company (Marketing, Product, R&D etc).

Many of the roles I came across had the same titles but the requirements and responsibilities of the roles were completely different. Making this connection between data skillset and business function not only helped me classify the different roles I discovered but also helped me understand how I wanted my career to progress. I created a framework to help with this and I wanted to share it with other budding Data Professionals that want to understand the differences between the skillsets required and the business problems that each role would focus on.

The other thing that I experienced was that there is sometimes a gap in understanding of the role between recruiters/hiring managers and applicants. This gap often isn’t highlighted until the 2nd or 3rd stage of the interview process. So I guess this post could also help hiring managers and business owners better understand what kind of Data Professional they are seeking, and in turn be able to convey that in their Role Spec, resulting in higher quality applicants and less time wasted for everyone.

To help structure the post, we’ll be using this Role Matrix:

Distribution of Data roles within an organisation (Image by Author)
Distribution of Data roles within an organisation (Image by Author)

This isn’t the only way to segment the Data Roles available in an organisation, it’s just the way that made the most sense to me.

We’ll be iterating through these different business functions and discussing the data roles available in each:

  • Growth
  • BI/Operations
  • Product
  • R&D

Growth

The primary focus within this business function is to grow the customer/user/client base of your company. The longer term projects that data professionals might work on within this function might include User Acquisition and Conversion Rate Optimisation among other marketing projects.

Growth Analyst

Tools – Excel, SQL, Python, Looker/Tableau, Salesforce, Analytics Tools, Digital Marketing Tools

Skills – Basic Scripting, Data Analysis, Data Visualisation, Data Story Telling, Presentation to Stakeholders/Non-Technical Audience, Analysis of A/B Tests (Desired)

Domain Knowledge – CRM Pipelines, Best Practices and Marketing Strategy for different marketing channels, Growth Specific KPIs and Metrics, Customer Segmentation, Campaign Optimisation

The day to day responsibilities of a Growth Analyst might include extracting large data sets using SQL and Python scripts and applying analysis on them to understand how effective the company’s marketing is on different channels. You will likely also be responsible for presenting on these data sets and KPIs to stakeholders and recommending changes to the marketing strategy off the back of your analysis. Apart from presentations, the responsibility of reporting automation will also tend to sit with this role.

Growth Data Scientist

Tools – SQL, Python, pandas, sklearn, keras, tensorflow, Analytics Tools

Skills – Statistical Modelling, AB Testing Design and Analysis, Causal Inference, Machine Learning, Research

Domain Knowledge – Marketing Attribution, Conversion Rate Optimisation, Campaign Optimisation, Marketing Automation

A Data Scientist working within the Growth function would typically be applying Statistical and Machine Learning techniques to optimise the growth of user base. They could be applying statistical techniques to better understand the attribution of conversions to different marketing campaigns. Similarly they could be using ML techniques to segment their user base into different groups. In some places the responsibility of deploying these models after researching and developing them also sit with this role.

Growth Data Engineer

Tools – SQL, Python, JavaScript, Cloud Platforms (AWS, Azure, GCP), Marketing Tool SDKs/APIs, Analytics Tools

Skills – Data Modelling & Architecture, ETL Pipelines, Model Deployment, DevOps, API Deployment, Web Scraping

Domain Knowledge – Reporting APIs for Marketing Tools, Growth Specific KPIs and Metrics, User Acquisition, Marketing Automation

A Growth Data Engineer will typically be responsible for acquiring, cleaning and hosting the data that Growth Analysts and Data Scientists use to solve business problems. Aside from using reporting APIs to pull useful marketing data, Growth Data Engineers would also be building and maintaining ETL pipelines that transform raw data into a more useful format. As such they’d need some back end engineering skills such as DevOps to be able to do this.

A Full Stack Growth Data Scientist is, as the name suggests, someone who can perform the responsibilities of all three of the Growth Data Roles mentioned above. It’s easy to get carried away and hire for this role but it’s important to bear in mind that while this person would be able to perform all three roles, they might not function as efficiently by themselves as if you were to hire, for example, a Growth Data Scientist and a Growth Data Engineer.

It’s also worth noting that Full Stack Data Scientists are hard to come by since most Data Professionals tend to specialise as they progress through their career. If you’re a budding Data Professional, you could choose this route rather than specialising. Bear in mind, however, that although the breadth of skills you’d develop would be large, it would be hard to delve deeper into those skills.


BI/Operations

This business function usually focuses on the business outcomes of the company and on the way it is run. Data Professionals working in this field would typically be leveraging data and transforming them into actionable insights. They’d be working projects like Supply Chain Optimisation and Financial Risk among other things.

BI Analyst

Tools – Excel, SQL, Python, Looker/Tableau, Salesforce

Skills – Basic Scripting, Data Analysis, Data Visualisation, Reporting, Presentation to Stakeholders/Non-Technical Audience, Exploratory Data Analysis, ETL Pipelines (Desirable)

Domain Knowledge – Pricing Models, Supply Chain Logistics, Staffing Logistics, Customer Operations, Financial Operations, Business Specific KPIs

The day to day responsibilities of a BI Analyst might include extracting large data sets and building reports to demonstrate the performance of business KPIs. Some experience with ETL pipelines will also help them ensure the quality of data used in the reports is up to scratch. They’d also be responsible for presenting actionable insights to stakeholders produced by exploratory analysis. For example, when analysing supply chain logistics, BI Analysts should not only be able to highlight the bottlenecks in the supply chain, but also provide actionable recommendations on how to unblock them.

Operations Data Scientist

Tools – SQL, Python, pandas, sklearn, keras, tensorflow

Skills – Statistical Modelling, Machine Learning, Network Analysis, Forecasting, Optimisation (Linear & Integer Programming)

Domain Knowledge – Operational Research, Pricing Models, Supply Chain Logistics, Staffing Logistics, Customer Operations, Financial Operations

A Data Scientist working within the BI or Operations function would be applying Statistical and Machine Learning techniques to optimise the company’s overall function. The best Data Scientists in this function create true business value for their companies in the form optimising financial and operational costs. They could be applying statistical techniques to forecast the financial performance of the company under different strategies. Similarly they could be using optimisation methods like linear and integer programming to ensure the optimal number of staff are assigned to each project or business function.

BI/Operations Data Engineer

Tools – SQL, Python, Cloud Platforms (AWS, Azure, GCP)

Skills – Data Modelling & Architecture, ETL Pipelines, Model Deployment, DevOps, API Deployment, Data Validation

Domain Knowledge – Operational Research, Pricing Models, Supply Chain Logistics, Staffing Logistics, Customer Operations, Financial Operations, Business Specific KPIs

A BI/Operations Data Engineer will typically be responsible for the upkeep of the infrastructure used to collect, clean and store data that BI Analysts and Data Scientists use. They’d be responsible for the ETL pipelines that allow Analysts to report on key business metrics. They’d also be responsible for ensuring the quality of financial data used by Data Scientists for forecasting purposes.

A Full Stack BI Data Scientist is someone who can perform the responsibilities of all three of the BI/Operations Data Roles mentioned above. Once again it’s worth bearing in mind that the it’s difficult to gain depth of experience in all three roles while taking on the wide range of responsibilities that come with being a Full Stack Data Scientist.


Product

This business function usually focuses on the core products that the company sells. Data Professionals working in this field would typically be leveraging user engagement data and transforming them into actionable insights on how the products are used. They’d be working a lot of Experimentation and Churn Prediction among other projects.

Product Analyst

Tools – Excel, SQL, Python, Looker/Tableau, Product Design/Management Tools, Product Analytics Tools

Skills – Basic Scripting, Data Analysis, Data Visualisation, Data Story Telling, Presentation to Stakeholders/Non-Technical Audience, Exploratory Data Analysis, Product AB Testing, ETL Pipelines (Desirable)

Domain Knowledge – User Flow Analysis, Cohort Analysis, Product Engagement KPIs, User Segmentation

The day to day responsibilities of a Product Analyst might include extracting large data sets and building reports to demonstrate how user are engaging with the product. Some experience with ETL pipelines will also help them ensure the quality of data used in the reports is up to scratch. They’d also be responsible for presenting actionable insights to stakeholders produced by exploratory analysis. For example, when analysing the onboarding flow of users, Product Analysts would not only highlight the stages with the highest drop offs, they’d tie their findings with qualitative data to understand why these drop offs are happening. They’d then be able to propose potential solutions to this which can be AB tested.

Product Data Scientist

Tools – SQL, Python, pandas, sklearn, keras, tensorflow, Product Analytics Tools

Skills – Statistical Modelling, Machine Learning, AB Testing, Causal Inference, Model Deployment

Domain Knowledge – AB Testing, Churn Prediction, User Segmentation, Conversion Prediction, Cohort Analysis, Lifetime Value Modelling, Product Engagement KPIs

A Data Scientist working within the Product function would be applying Statistical and Machine Learning techniques to optimise the user’s experience of a product. They could be applying statistical techniques to model the distributions of different KPIs before and after running AB tests, which can be used to shape the product strategy of a company. Similarly they could be building ML models to predict when a user will churn from their service or to predict the lifetime value of a user to the company.

Product Data Engineer

Tools – SQL, Python, Cloud Platforms (AWS, Azure, GCP), Product Analytics Tools

Skills – Data Modelling & Architecture, ETL Pipelines, Model Deployment, DevOps, API Deployment, Data Validation

Domain Knowledge – Event Tracking Design & Implementation, AB Testing Infrastructure, Product Engagement KPIs

A Product Data Engineer will typically be responsible for the upkeep of the infrastructure used to collect, clean and store event tracking data that Product Analysts and Data Scientists use. They’d be responsible for the ETL pipelines that converts raw event data to a nicer format for Analysts to report on product KPIs. They’d also be responsible for designing and building AB testing infrastructure that allows the company to test many variants of the product at the same time.

A Full Stack Product Data Scientist is someone who can perform the responsibilities of all three of the Product Data Roles mentioned above. Once again it’s worth bearing in mind that the it’s difficult to gain depth of experience in all three roles while taking on the wide range of responsibilities that come with being a Full Stack Data Scientist. My current role is sort of a Full Stack Product Data Scientist for a start up. While it has a lot of different responsibilities, the scale of the user base is still small so it is manageable. However, as the user base and the company scale up, I will be hoping to specialise more into a Product Data Scientist/Product Analyst Role.


R&D

While the main focus of the Research and Development business function will differ from industry to industry and from business to business, it provides value to the company through research, trials and implementation of new techniques to solve their customers’ problems better. Data Professionals working in this function would typically be working on the core problems that the company’s trying to solve.

Data Scientist

Tools – SQL, Python, pandas, sklearn, keras, tensorflow, Auto ML Tools

Skills – Statistical Modelling, Machine Learning, Model Evaluation, Model Deployment, Deep Learning, Transfer Learning

Domain Knowledge – Depends on the business/core products, NLP, Computer Vision, Network Analysis, Recommendation Systems

A Data Scientist focussing primarily on R&D would spend their working hours trialling, implementing and improving the core ML models which are used in the product/throughout the company. For example, they might research how cutting edge NLP models work, adapt them for the specific use case of the business and test whether they work better than existing models. In some companies these Data Scientists would also be responsible for deploying the models into production.

Data Engineer

Tools – SQL, Python, Cloud Platforms (AWS, Azure, GCP), Big Data Tools (Spark, Hadoop etc), NoSQL Databases

Skills – Data Modelling & Architecture, ETL Pipelines, Model Deployment, DevOps, API Deployment, Data Validation

Domain Knowledge – Depends on the business/core products, Recommendation Engines, Data-heavy ETLs

Data Engineers would usually be responsible for acquiring, cleaning and hosting the large data sets that Data Scientists use to build their models. They’d also build durable ETL pipelines that transform data into more useful formats. In some companies, Data Engineers would also provide support to Data Scientists to deploy their ML models into production. DevOps skills would also be handy, since Data Engineers would need to work very closely with other Backend Engineers.

ML Researcher

Tools – SQL, Python, pandas, sklearn, keras, tensorflow, Auto ML Tools

Skills – Statistical Modelling, Machine Learning, Model Evaluation, Model Deployment, Deep Learning, Transfer Learning

Domain Knowledge – Depends on the business/core products, NLP, Computer Vision, Network Analysis, Recommendation Systems

Similar to a Data Scientist, a ML Researcher would responsible for trialling, implementing and improving the core ML models which are used throughout the company. The main difference between the two roles is that ML Researchers approach the research aspect of the role from a more academic stand point. Not only do they adapt and implement new ML models, they’d be performing research to actually develop said new ML models. Typically ML Researchers would like to specialise into a few specific areas within ML such as NLP or Recommendation Systems.

ML Engineer

Tools – SQL, Python, pandas, sklearn, keras, tensorflow, Auto ML Tools, Cloud Platforms (AWS, Azure, GCP), Big Data Tools (Spark, Hadoop etc), NoSQL Databases

Skills – Statistical Modelling, Machine Learning, Model Evaluation, Model Deployment, Deep Learning, Transfer Learning

Domain Knowledge – Depends on the business/core products, NLP, Computer Vision, Network Analysis, Recommendation Systems

The ML Engineer is similar to the above 3 roles. The day to day responsibilities of a ML Engineer would involve not just implementing and improving the ML models used by the company, but also deploying them to production. I guess we could think of it as a mix of the Data Scientist and Data Engineer roles, with a higher emphasis on ML and Deep Learning concepts. Typically ML Engineers would like to specialise into a few specific areas within ML such as Computer Vision.


As I mentioned earlier, this isn’t the only one way to segment the different data roles that sit within an organisation. Data skillsets are a spectrum, as are business functions, so these roles would vary from company to company. It’s likely that many roles may be a combination of 2 or even 3 roles that I’ve mentioned. But thinking about them in this framework helped me understand the different problems I’d be facing in each role. I hope it helps you too.


Related Articles