Data Driven Growth with Python

Know Your Metrics

Learn what and how to track with Python

Barış Karaman
Towards Data Science
8 min readMay 4, 2019

--

Introduction

This series of articles was designed to explain how to use Python in a simplistic way to fuel your company’s growth by applying the predictive approach to all your actions. It will be a combination of programming, data analysis, and machine learning.

I will cover all the topics in the following nine articles:

1- Know Your Metrics

2- Customer Segmentation

3- Customer Lifetime Value Prediction

4- Churn Prediction

5- Predicting Next Purchase Day

6- Predicting Sales

7- Market Response Models

8- Uplift Modeling

9- A/B Testing Design and Execution

Articles will have their own code snippets to make you easily apply them. If you are super new to programming, you can have a good introduction for Python and Pandas (a famous library that we will use on everything) here. But still without a coding introduction, you can learn the concepts, how to use your data and start generating value out of it:

Sometimes you gotta run before you can walk — Tony Stark

As a pre-requisite, be sure Jupyter Notebook and Python are installed on your computer. The code snippets will run on Jupyter Notebook only.

Alright, let’s start.

Part 1: Know Your Metrics

We all remember Captain Sparrow’s famous compass that shows the location of what he wants the most. Without a North Star Metric, this is how we are in terms of growth. We want more customers, more orders, more revenue, more signups, more efficiency…

Before going into coding, we need to understand what exactly is North Star Metric. If you already know and track it, this post can help you do a deep dive analysis with Python. If you don’t know, first you should find yours (probably you are already tracking it but didn’t name it as North Star conceptually). This is how Sean Ellis describes it:

The North Star Metric is the single metric that best captures the core value that your product delivers to customers.

This metric depends on your company’s product, position, targets & more. Airbnb’s North Star Metric is nights booked whereas for Facebook, it is daily active users.

In our example, we will be using a sample dataset of an online retail. For an online retail, we can select our North Star Metric as Monthly Revenue. Let’s see how our data look like on jupyter notebook.

Monthly Revenue

Let’s start with importing the libraries we need and reading our data from CSV with the help of pandas:

This is how our data looks like:

We have all the crucial information we need:

  • Customer ID
  • Unit Price
  • Quantity
  • Invoice Date

With all these features, we can build our North Star Metric equation:

Revenue = Active Customer Count * Order Count * Average Revenue per Order

It’s time to get our hands dirty. We want to see monthly revenue but unfortunately there is no free lunch. Let’s engineer our data:

Good job, now we have a dataframe that shows our monthly revenue:

Next step, visualization. A line graph would be sufficient:

Jupyter notebook output:

This clearly shows our revenue is growing especially Aug ‘11 onwards (and our data in December is incomplete). Absolute numbers are fine, let’s figure out what is our Monthly Revenue Growth Rate:

Everything looks good, we saw 36.5% growth previous month (December is excluded in the code since it hasn’t been completed yet). But we need to identify what exactly happened on April. Was it due to less active customers or our customers did less orders? Maybe they just started to buy cheaper products? We can’t say anything without doing a deep-dive analysis.

Monthly Active Customers

To see the details Monthly Active Customers, we will follow the steps we exactly did for Monthly Revenue. Starting from this part, we will be focusing on UK data only (which has the most records). We can get the monthly active customers by counting unique CustomerIDs. Code snippet and the output are as follows:

No. of active customers per month and its bar plot:

In April, Monthly Active Customer number dropped to 817 from 923 (-11.5%).

We will see the same trend for number of orders as well.

Monthly Order Count

We will apply the same code by using Quantity field:

Monthly order count and its bar plot:

As we expected, Order Count is also declined in April (279k to 257k, -8%)

We know that Active Customer Count directly affected Order Count decrease. At the end, we should definitely check our Average Revenue per Order as well.

Average Revenue per Order

To get this data, we need to calculate the average of revenue for each month:

Monthly average revenue per order and its bar plot:

Even the monthly order average dropped for April (16.7 to 15.8). We observed slow-down in every metric affecting our North Star.

We have looked at our major metrics. Of course there are many more and it varies across industries. Let’s continue investigating some other important metrics:

  • New Customer Ratio: a good indicator of if we are losing our existing customers or unable to attract new ones
  • Retention Rate: King of the metrics. Indicates how many customers we retain over specific time window. We will be showing examples for monthly retention rate and cohort based retention rate.

New Customer Ratio

First we should define what is a new customer. In our dataset, we can assume a new customer is whoever did his/her first purchase in the time window we defined. We will do it monthly for this example.

We will be using .min() function to find our first purchase date for each customer and define new customers based on that. The code below will apply this function and show us the revenue breakdown for each group monthly.

Dataframe output after merging with First Purchase Date:

Revenue per month for New and Existing Customers:

Line chart of the above:

Existing customers are showing a positive trend and tell us that our customer base is growing but new customers have a slight negative trend.

Let’s have a better view by looking at the New Customer Ratio:

New Customer Ratio has declined as expected (we assumed on Feb, all customers were New) and running around 20%.

Monthly Retention Rate

Retention rate should be monitored very closely because it indicates how sticky is your service and how well your product fits the market. For making Monthly Retention Rate visualized, we need to calculate how many customers retained from previous month.

Monthly Retention Rate = Retained Customers From Prev. Month/Active Customers Total

We will be using crosstab() function of pandas which makes calculating Retention Rate super easy.

First, we create a dataframe that shows total monthly revenue for each customer:

crosstab() function converts it to retention table:

Retention table shows us which customers are active on each month (1 stands for active).

With the help of a simple for loop, for each month we calculate Retained Customer Count from previous month and Total Customer Count.

In the end, we have our Retention Rate dataframe & line chart like below:

Monthly Retention Rate significantly jumped from June to August and went back to previous levels afterwards.

Cohort Based Retention Rate

There is another way of measuring Retention Rate which allows you to see Retention Rate for each cohort. Cohorts are determined as first purchase year-month of the customers. We will be measuring what percentage of the customers retained after their first purchase in each month. This view will help us to see how recent and old cohorts differ regarding retention rate and if recent changes in customer experience affected new customer’s retention or not.

This will be a bit more complicated than others in terms of coding.

Tx_retention has this amazing view of cohort based retention rate:

We can see that first month retention rate became better recently (don’t take Dec ’11 into account) and in almost 1 year, 15% of our customers retain with us.

Finally…We know our metrics and how to track/analyze them with Python.

You can find the jupyter notebook of this article here.

Let’s try to segment our base to see who are our best customers in Part 2.

I’ve started to write the more advanced and updated version of my articles here. Feel free to visit, learn more and support.

--

--