Tutorial

Build Customer Journeys Using SQL

Learn to track consumers across multiple channels

Boris J
Towards Data Science
5 min readMar 8, 2023

--

Image source: Krivec Ales, Pixabay

Customer Journey Market

The global customer journey analytics (CJA) market reached a value of $8.3 billion dollars in 2020 and projected to grow to 25.1 billion by 2026, at a CAGR of 20.3% during the forecast period[1]. Engagement in this market comes at a hefty price tag, and not every company is at the stage where they can fork over a sizeable chunk of money. While there is a lot of information on CJA, there is little to no information on how to structure a dataset to derive insights for a customer journey. In this tutorial, I’ll show you how to create a customer journey dataset using Structured Query Language (SQL).

What is a Customer Journey?

We can define a customer journey as a succession of interactions by a customer across multiple channels. These channels may comprise email, website, mobile application, call center, social media or in-store purchases. A journey may show a customer interacting with email, next web, followed by direct mail and ending in a purchase. These time stamped interactions enable us to understand the timing and sequence of customers' engagement with a company’s products. Each step along the journey can provide valuable insights that can improve the customer experience, conversion, and subsequent marketing efforts.

Customer Journey Analytics: Build with SQL

Journey analytics enables CX teams to visualize customer behavior across channels and time, define in-journey signals that show likelihood of conversion, and monitor performance toward end-of-journey success.

Our data comprises interactions and orders for a fictitious company that sells cars. Interactions (i.e. web, mobile, etc.) in the public.interactions table are the channels customers used to engage with our company before making a purchase. We capture purchases in the public.orders table.

Building the customer journey dataset takes three steps. Step 1: Main Code sequences the customer interactions based on the interaction date. Step 2: Main Code CTE creates a temporary of the main table and makes it available for step 3. Step 3: Build Customer Journey as the name suggests creates the customer journey dataset. So, let’s analyze each step in SQL code below to understand how they work.

Author, SQL Code

Main Code (Step1)

We start by joining the interactions table and the orders table on the customer id, public.interactions = public.orders on t1.customer_id = t2.customer_id. We capture completed purchases in the orders table. So, the resulting journeys will only include interactions for customers who made a purchase.

To capture and sequence interactions, we use a Lead() function. The lead function accesses data from the next row based on the offset value. The syntax for the lead function is:

LEAD(return_value [,offset[, default ]]) OVER (
PARTITION BY expr1, expr2,...
ORDER BY expr1 [ASC | DESC], expr2,...
)

In the table below, the first interaction is determined by the function lead(t1.interaction,0) over (partition by t1.customer_id order by t1.interaction_date asc). First partition by groups the customer records based on t1.customer_id. Second, the order by t1.interaction_date asc sorts the interactions by date. Last, offset accesses the next row, or the second row that follows the current row, or the third row that follows the current row, and so on. Here, the offset is zero, lead(t1.interaction,0) so the returned value is Mobile. If the offset is 1, lead(t1.interaction,1) the returned is Web.

I recommend importing the data into a database and applying the SQL code above to fully understand the information. You can find the datasets here.

Encase Main Code in CTE (Step 2)

We process the Main Code by adding it to a Common Table Expression (CTE). The syntax for the CTE is:

with ctedata as
(
--Step 1 code
)

The primary reason we use a CTE is so we can use the where clause and filter the rows using columns derived from the Lead() or Row_Number() function. Columns created in the main code using functions can not be used in the where clause. This prepares the data for step 3

Build Customer Journey Dataset (Step 3)

In the dataset created in step 1, some rows were missing data. We remove the missing rows by filtering, where offset = 0. We also aggregate the data, shown below, by summing the total customers who made a purchase by first_interaction and second_interaction.

Author, Dataset

The table above shows how customers engage prior to a purchase. In this example, customers who purchase Vintage Cars from our fictitious company engage mobile and email channels. With customer journey data, we get insights like which channels perform best for engagement and lead to sales; which customer paths result in churn; what’s the best moment to engage consumers in their journey and what paths do different audiences take[2].

This knowledge may help inform marketing strategy. While a table can be effective in interpreting data, visualizing the data can help paint an intuitive customer journey. So, let’s try our hand at it in the next section.

Visualizing Customer Journey

The Sankey chart, shown below, is useful for illustrating the sequence of a customer journey across multiple channels such as email, website, mobile application, call center, social media.

Author, Sankey Chart Data

We read the Sankey chart from left to right. Each vertical bar represents a node with consumers. For example, the blue node shows 23.24% of consumers first engaged the brand through direct mail. These customers next engaged with a mix of mobile, direct mail, web and email. The thickness of the band visually captures the number of consumers flowing between interactions. The thicker the band means more consumers flowing toward a second interaction node. If you want to create a Sankey chart, I recommend trying either Chart Expo or Visual Paradigm, paid services. I prefer Chart Expo because of its online tutorial videos. If you decide to use it, remember to export the data using the format for the SQL code provided.

Custom Journey: Continue Learning

There’s a lot to uncover when learning how to track customers across multiple channels. To help, I’ve included a video below in this tutorial. It covers the same material, however, discussing examples with illustrations might enhance the learning process. If you find it challenging to understand some concepts, I encourage you to review the video.

Author, Video

Well, that’s it for now. I hope you found this tutorial helpful. Contact me with questions. I’m here to share and grow.

References:

[1]: Markets and Markets, Customer Journey Analytics Market by Component Organization and Size

[2]: Karolina Matuszewska, Marek Juszczyński. (October 3, 2022). What is customer journey analytics and why it’s important for your business

--

--