Opinion

Note: This article is inspired by my experience at Meta and is inherently an opinion-based piece!
Increasingly, I’m seeing four archetypes emerge in the Data Science industry. (1) The AI Researcher, (2) the Product Scientist, (3) the ML Engineer, and (4) the Analytics Engineer. If you don’t target one of these archetypes, you risk slowing down your entry into the career field, or growing slowly while in it.
The Archetypes
1. The AI Researcher
I’m starting here because this, perhaps naively, is what most people think of when they hear ‘data science.’ This field is characterized by PhD-level talent. This is not mistake; research is in the name. You don’t necessarily need a CS PhD; economics, psychology, etc. are in demand too!
The solution-development timeline is quite slow. Think months to years, not days to weeks. It’s impractical to build a custom-research driven solution for every problem. So, this archetype focuses on pushing the state of the art on benchmark tasks, which ideally support a wide array of downstream tasks. Think ‘Facebook Prophet.’
If you only have a bachelors or masters and/or you’re interested in customer-facing work, this archetype isn’t an ideal fit for you.
Other names: Applied Scientist, AI Scientist
2. The Product Scientist
This flavor of data science is characterized by global strategy. Your goal isn’t to automate some process away but to understand the process in the context of the global user-product ecosystem. Statistics is of much higher emphasis here than ML (since ML is less easy to interpret.) The sort of questions you might ask are ‘Does conditioning on gender meaningfully change the relationship age has with product usage?’ Statistical models are frequently used to reduce the plausible hypothesis search space. Following hypothesis formulation, experimentation is used extensively to validate hypotheses and drive product strategy.
The ideal candidates here have bachelors or masters backgrounds in psychology, economics or any statistics-heavy social science discipline. Interpretation is king here, along with heavy exposure to SQL queries, data warehouses, etc. Production databases seldom have all the key variables collected nicely together. The ability to query massive volumes of data using tools like Spark, while optimizing for query importance is big nice to have.
Other names: Product Analyst, Statistician
3. The ML Engineer
Second to AI Researcher, this flavor of data science is an extremely common schema people tend to share when they hear of ‘data science.’ This archetype is all about deploying ML models to production as product features. For example, building a facial recognition model and integrating into a social media app.
Interpretation isn’t the end all goal – here accuracy and performance are king. The ML Engineer typically isn’t in a position to ask the questions, ‘is this feature worth building? how will it appeal to customer base?‘ (these are answered by the Product Science archetype.) Rather, the ML Engineer will ask, ‘what pre-trained models (ie BERT, YOLO, etc.) can be leveraged with transfer learning and the data available to train and launch and an efficient and accurate ML-based product feature?‘
As discussed in 1., it’s too expensive to architect a truly novel model for any one-off task from scratch. Rather, this archetype will leveraging existing APIs and focusing on cohesive integration. If you want to design models from scratch and really focus on the why and how of their inner-mechanics, this archetype might not be for you.
Other names: Software Engineer
4. The Analytics Engineer
This archetype is characterized by building data pipelines and compelling visualizations/dashboards. This archetype has a strong bachelors-level background in CS topics, using tools like Spark to perform ETL on truly massive volumes of data.
This archetype isn’t quite as customer/product/business-savvy as the Product Scientist, however, knowing which aggregate metrics matter most to the business is key. For example, a Product Scientist needs to focus most on testing hypotheses, estimating (potentially latent) parameters, and eliminating/updating hypotheses and user stories that aren’t supported by the data. However, there are a potentially infinite number of hypotheses out there and proper statistical analysis can be slow or expensive. Simple visualizations of key metrics can reduce the search space for a Product Scientist; business acumen makes an Analytics Engineer indispensable.
Other names: Data Engineer, Business Intelligence Engineer
Finding the right fit
Finding the right fit isn’t always easy. But ask yourself a few questions.
1. Are you interested in the models or the applications?
If models, narrow your search to AI Researcher or ML Engineer. If you’re more interested in the applications, narrow your search to Product Scientist. This question doesn’t help much in terms of validating whether you’re ideally suited for work as an Analytics Engineer.
2. Do you prefer understanding why models work or solving persistent problems?
If you really want to dig into the nuts and bolts of how models work, then you really ought to pursue the AI Researcher archetype. If, however, you want to solve specific problems through a deployed product features, an ML-Engineer role is better suited for you.
3. Do you prefer to understand patterns in user behavior in the user-product ecosystem or are you more passionate about developing scalable infrastructure to support this purpose?
If you prefer understanding and inference, Product Scientist is the right place for you. But if you’re more interested in the infrastructure side, opt for Analytics Engineer.
Parting Advice
1. The Funnel
Treat your career overtime as a funnel. Start broad and end narrow. If you’re in the middle of your undergrad, there’s no need to pick an archetype right now, just get as much exposure across the board as possible. Throughout your career, try to find answers to the above questions and converge on the archetype that works best for you. Starting to narrow and failing to narrow ever are two mistakes that could stunt your growth or get you pigeonholed in a role you don’t enjoy.
2. FAANG vs non-FAANG
These archetypes are FAANG centric. Startups and large, yet older companies might not view data science through the lens of archetypes. You might find that you’re asked to be a generalist in a smaller/older company. This is a great way to find what archetype appeals most to you; however, as discussed in the above, working as a generalist might stunt your growth – unless you’re interested more in management than direction contribution. Likewise, you might find yourself in an AI Researcher role at a smaller/older company without having a PhD.