5 Things That I Actually Did at Work as a Data Scientist in 2021

A glimpse into what a Data Scientist actually does

Photo by Joshua Earle on Unsplash


One question that I get asked a lot is “what do you actually do at work?” and it’s a really good question. The data science projects and Kaggle notebooks that you see online represent a different reality than what a data scientist does at work (or what I do at work for the most part).

And so, I wanted to share with you 5 things that I ACTUALLY did at work last year in 2021. Let’s dive into it!

Be sure to SUBSCRIBE here to never miss another article on data science guides, tricks and tips, life lessons, and more!

1) Cohort Analysis

Image created by Author

What is a cohort analysis?

A cohort analysis involves building cohort charts, like the image above. A cohort chart is a data of data visualization commonly used in behavioral analytics.

A cohort chart is generally structured as follows:

  • Each row represents a different cohort or a different group of users. In the image above, you can think of each row representing users segmented by their first registration date.
  • Columns represent periods, whether it be in days, months, or years.
  • The values represent how a metric of interest progresses over time.

How is it useful in a business context?

Cohort analyses are useful when you want to see how a metric progresses over time — in other words, if a metric takes time to mature, it should be looked at in a business function.

Let me give an example of when this would be useful. Consider the metric average number of orders per customer (AOPC). Intuitively, older customers’ AOPC would be higher than newer customers because they would have had more time to make more orders. This makes it hard to compare how newer customers’ AOPC is against older customers.

However, by conducting a cohort analysis, you’re able to how AOPC evolves over periods of time, which allows you to compare newer and older customers’ AOPC in a given time period.

To see how to build a cohort chart in SQL or Python, check out my guide:

Be sure to SUBSCRIBE here to never miss another article on data science guides, tricks and tips, life lessons, and more!

2) Data Pipeline Development

What is data pipeline development?

Data pipeline development involves designing and building systems (pipelines) that clean and transform raw data inputs to a desired output.

There are many ways that this can be done and many technologies to use, but in my case, this involved writing and scheduling efficient queries on Airflow and pushing the outputs to several BigQuery tables.

How is it useful in a business context?

Most of the time, the format that a company's raw data is in is not ideal for analysis or modeling. For example, you may have a table that includes every single transaction that all users make or a table that includes every single touchpoint that every user makes in your app. That is an immense amount of data that needs to be consolidated into comprehensive aggregations.

Overall, strong pipeline development results in:

  1. Quicker turnarounds for ad-hoc analyses
  2. The ability to conduct more complex analyses and build stronger models
  3. Finally (and most importantly), the democratization of data for less technical users.

3) Propensity Modelling

Created by Author

What is propensity modeling?

Propensity modeling answers the question “what is the likelihood that someone will do something based on x, y, and z?” In my case, I built a propensity model that answered the question, “what is the likelihood that each user will become an active user based on their registration information and marketing touchpoints?”

How is it useful in a business context?

Propensity modeling is useful because it allows you to assign a probability that a user will do something in the future, whether it's adopting a new product, subscribing to a new subscription, etc.

By accurately assigning scores to users, you can do two things:

  1. You can make predictions about what users are likely to do in the future and develop unique targeting strategies.
  2. You can also get an understanding of the characteristics and variables that drive users to be more (or less likely) to do something.

Overall, propensity modeling is, in my opinion, one of the most applicable businesses cases where machine learning is highly effective.

4) Explanatory Modeling

What is explanatory modeling?

Explanatory modeling involves modeling feature variables and a target variable with the intent of better understanding the relationships between the variables. This can be done through linear regression methods with f-tests and t-tests, or by using the SHAP technique for ML models

How is it useful in a business context?

While explanatory modeling doesn’t explain causation, it’s a very effective way to better understand associations between different variables. For example, through explanatory modeling, I was able to deduce that app activity was one of the strongest predictors for propensity modeling.

As well, results from explanatory modeling can spark ideas for experiments in cases where you want a stronger understanding than association, which is the causal relationship between particular variables.

Be sure to SUBSCRIBE here to never miss another article on data science guides, tricks and tips, life lessons, and more!

5) Lifetime Value (LTV) Modeling

What is LTV modeling?

LTV, or lifetime value, represents the total net profit that you expect to get from a customer from when they first become a customer to when they churn. It’s a combination of customers’ ARPU, tenure, and churn.

How is it useful in a business context?

LTV is essential as a company focuses more on profitability. With an LTV model, you’re able to conduct analyses from a value (profit) perspective. What this means is that you can look at products or segments that attribute to a higher LTV. You can even conduct explanatory modeling to identify variables/features/attributes that are indicative of high-profit users.

Thanks for reading!

Be sure to SUBSCRIBE here to never miss another article on data science guides, tricks and tips, life lessons, and more!

Not sure what to read next? I’ve picked another article for you:

and another one:

- Terence Shin

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Recommended from Medium

Organizing data in BigQuery

Animated storytelling using the Javascript D3 library

Here’s where you can see Live Updates and Statistics on the Coronavirus

The Great Data Analyst

Avoiding Data Visualization Errors

Pairs trading. Pair selection. Distance (Part 1)

A New On The Go Map

Analysis Of Instacart From Kaggle Competition

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Terence Shin

Terence Shin

Data Scientist @ KOHO | Data and Marketing Advisor | Top 1000 Writer on Medium | MSc, MBA | https://www.linkedin.com/in/terenceshin/

More from Medium

22 Habits to Become a Better Data Scientist in 2022

After 4 years of Data Science, Here’s What I Learnt

Is Data Science Dead in 10 Years?

Is a Degree in Data Science Worth It in 2022?