The world’s leading publication for data science, AI, and ML professionals.

SQL Digital Marketing Analysis

An approach in MySQL Workbench revising some of the major SQL queries applied to marketing operations and common analytics requests.

Image by author - Gonçalo Guimarães Gomes
Image by author – Gonçalo Guimarães Gomes

Introduction

The main goal of this SQL Marketing Analytics setup is to help the marketing team of the mediumcompany (fictitious) to figure out what should be the right amount of bid for various segments of paid traffic, based on how well they perform and how much revenue they make.

In the MySQL Workbench environment, we will use the SQL database mediumcompany, in concrete, two tables ‘website_sessions’ and ‘orders’, to help us understand where is traffic coming from, and how it performs in terms of volume and conversion rates. We will be also adjusting bids to optimize the marketing budgets.

Along with the following selected 5 common assignments, we’ll be coding simple Sql queries but powerful enough to answer several operational marketing questions. We’ll be dealing with examples about joining tables, calculating conversion rates, device performance, and volume trends with breakdowns.


<Assignment #1>

From: CEO Subject: Website traffic breakdown Date: April 12, 2012

We’ve been live for almost a month now and we’re starting to generate sales. Can you help me understand where the bulk of our website sessions are coming from, through yesterday? I’d like to see a breakdown by source, campaign and referring domain.

— ThinkingWe want to count the website_session_id (number of sessions) grouped by utm_source, utm_campaign and utm_referer. This is our expected results’ table:

Image by author.
Image by author.

To solve this question, we’ll only need the website_sessions’ table.

Image by author.
Image by author.
SELECT * FROM mediumcompany.website_sessions;
Image by author.
Image by author.
USE mediumcompany; -- set global use of the db.
SELECT utm_source, utm_campaign, http_referer,
COUNT(DISTINCT website_session_id) AS nb_sessions
FROM website_sessions
WHERE created_at < '2012–04–12'
GROUP BY utm_source, utm_campaign, http_referer
ORDER BY nb_sessions DESC;

— HintGroup and order faster by each position in the SELECT statement:

SELECT 
(...)
GROUP BY 1, 2, 3
ORDER BY 4 DESC;

— Output

Image by author.
Image by author.

— UnderstandingStart by selecting source, campaign and referer from website_sessions table, for the period until the 12th of April 2012.

SELECT utm_source, utm_campaign, http_referer
FROM website_sessions
WHERE created_at < "2012–04–12"
Image by author.
Image by author.

Then count the number of sessions with a GROUP BY clause, aggregating the values for each combination. Rank the session in descending order (highest at top).

SELECT (...)
COUNT(DISTINCT website_session_id) AS nb_sessions
(...)
GROUP BY 1, 2, 3
ORDER BY 4 DESC;
Image by author.
Image by author.

We can conclude that gsearch and nonbrand are driving more traffic (sessions) than any other campaign or sources for the period until the 12th of April 2012.


<Assignment #2>

From: Marketing Director Subject: Gsearch conversions Date: April 14, 2012

Looks like gsearch nonbrand is our major traffic source, but we need to understand if those sessions are driving sales. Could you please calculate the conversion rate (CVR) from session to order? Based on what we’re paying for clicks, we’ll need a CVR of at least 4% to make the numbers work.

— ThinkingWe want to count the total number of sessions and orders for gsearch and nonbrand only, for the period until the 14th of April 2012. Regarding the conversion rate, we simply divide the orders by the sessions. This is our expected results’ table:

Image by author.
Image by author.

To solve this question, we’ll need both the website_sessions (which we already know from the previous assignment) and the orders tables (see below).

Image by author.
Image by author.
SELECT * FROM mediumcompany.orders;
Image by author.
Image by author.

We’ll do a ‘website_sessions’ LEFT JOIN ‘orders’ because we want to see all website_session_id from the website_sessions table and see when there is a matching order from the orders’ table. We’ll be joining both on the website_session_id.

USE mediumcompany;
SELECT
COUNT(DISTINCT ws.website_session_id) AS sessions,
COUNT(DISTINCT o.order_id) AS orders,
(COUNT(DISTINCT o.order_id) / COUNT(DISTINCT ws.website_session_id)) *100 AS session_to_order_conv_rate
FROM website_sessions ws
LEFT JOIN orders o 
ON o.website_session_id = ws.website_session_id
WHERE ws.created_at < '2012–04–14' 
AND ws.utm_source = 'gsearch' 
AND ws.utm_campaign = 'nonbrand';

— Hint Round ‘session_to_order_conv_rate’ by 2 decimals:

SELECT
(...)
ROUND((COUNT(DISTINCT o.order_id) / COUNT(DISTINCT ws.website_session_id)) * 100 ,2) AS session_to_order_conv_rate

— Output

Image by author.
Image by author.

— UnderstandingStart by joining both tables (website_sessions and orders) with a left join on website_session_id, selecting website_session_id and order_id.

SELECT 
ws.website_session_id AS sessions,
o.order_id AS orders

FROM website_sessions ws
LEFT JOIN orders o 
ON o.website_session_id = ws.website_session_id
WHERE ws.created_at < "2012–04–14"
AND ws.utm_source = 'gsearch' 
AND ws.utm_campaign = 'nonbrand';
Image by author.
Image by author.

Count the total number of sessions and orders.

SELECT 
COUNT(DISTINCT ws.website_session_id) AS sessions,
COUNT(DISTINCT o.order_id) AS orders
(...)
Image by author.
Image by author.

Calculate the conversion by dividing orders by sessions (*100 for rate).

SELECT (...)
(COUNT(DISTINCT o.order_id) / COUNT(DISTINCT ws.website_session_id)) *100 AS session_to_order_conv_rate 
(...)
-- Round 2 decimals:
ROUND((COUNT(DISTINCT o.order_id) / COUNT(DISTINCT ws.website_session_id)) *100 ,2) AS session_to_order_conv_rate
Image by author.
Image by author.

The conversion rate is of 2.9% means the gsearch nonbrand bids are not driving sales as expected, the investment is not working the best way.


<Assignment #3>

From: Marketing Director Subject: Gsearch volume trends Date: May 10, 2012

Based on your conversion rate analysis, we bid down gsearch nonbrand on 2012–04–15. Can you pull gsearch nonbrand trended session volume, breakdown by week, to see if the bid changes have caused volume to drop at all?

— Thinking We want to count the gsearch nonbrand sessions in a time series breakdown by week, until the 10th of May 2012. To do so, group them by year and week, finding the first – or minimum – , day of each week. Finally, see if the sessions’ volume went down from the 15th of April. This is our expected results’ table:

Image by author.
Image by author.
USE mediumcompany;
SELECT 
MIN(DATE(created_at)) as week_started_at,
COUNT(DISTINCT website_session_id) AS sessions
FROM website_sessions
WHERE created_at < '2012–05–10' 
AND utm_source = 'gsearch' 
AND utm_campaign = 'nonbrand'
GROUP BY YEAR(created_at), 
WEEK(created_at);

— Hint We can actually group by columns (YEAR, WEEK) not included in the SELECT statement.

— Output

Image by author.
Image by author.

— Understanding Start by selecting and grouping by year, week, date, to count the number of sessions.

SELECT 
YEAR(created_at) AS yr,
WEEK(created_at) AS wk,
DATE(created_at) AS dt,
COUNT(DISTINCT website_session_id) AS sessions
(...)
GROUP BY 1, 2, 3
Image by author.
Image by author.

Find the first, or minimum, day of the week. Since we continue to group them by year and week, but not added in the SELECT clause, the granularity of the date will become thicker.

SELECT 
MIN(DATE(created_at)) as week_started_at,
COUNT(DISTINCT website_session_id) AS sessions
(...)
GROUP BY YEAR(created_at), WEEK(created_at);
Image by author.
Image by author.

After the 15th of April, it is confirmed that the traffic for gsearch nonbrand has considerably dropped.


<Assignment #4>

From: Marketing Director Subject: Gsearch device-level performanceDate: May 11, 2012

I was trying to use our site on my mobile the other day, and the experience was not great. Could you pull conversion rates from session to order, by device type? If desktop performance is better than on mobile we may be able to bid up for desktop specifically to get more volume?

— ThinkingWe want to count the total number of sessions and orders for gsearch and nonbrand, for the period until the 11th of May 2012. Regarding the conversion rate, we simply divide the orders by the sessions. Then, we group by device_type. This is our expected results’ table:

Image by author.
Image by author.
USE mediumcompany;
SELECT 
ws.device_type,
COUNT(DISTINCT ws.website_session_id) AS sessions,
COUNT(DISTINCT o.order_id) AS orders,
ROUND((COUNT(DISTINCT o.order_id) / COUNT(DISTINCT ws.website_session_id)) * 100, 2) AS session_to_order_conv_rt
FROM website_sessions ws
LEFT JOIN orders o 
ON o.website_session_id = ws.website_session_id
WHERE
ws.created_at < '2012–05–11'
AND ws.utm_source = 'gsearch'
AND utm_campaign = 'nonbrand'
GROUP BY 1;

— Output

Image by author.
Image by author.

— Understanding Start by counting the number of sessions and orders, as well as the conversion rate.

SELECT
COUNT(DISTINCT ws.website_session_id) AS sessions,
COUNT(DISTINCT o.order_id) AS orders,
(COUNT(DISTINCT o.order_id) / COUNT(DISTINCT ws.website_session_id)) *100 AS session_to_order_conv_rate
(...)
Image by author.
Image by author.

Call and group by to breakdown by device.

SELECT 
ws.device_type,
COUNT(DISTINCT ws.website_session_id) AS sessions,
COUNT(DISTINCT o.order_id) AS orders,
ROUND((COUNT(DISTINCT o.order_id) / COUNT(DISTINCT ws.website_session_id)) * 100, 2) AS session_to_order_conv_rt
(...)
GROUP BY 1;
Image by author.
Image by author.

Desktop is performing a way lot better, therefore we should rise the bids for this type of device to increase volume.


<Assignment #5>

From: Marketing Director Subject: Gsearch device-level trends Date: June 09, 2012

After your device-level analysis of conversion rates, we realized desktop was doing well, so we bid our gsearch nonbrand desktop campaigns up on 2012–05–19. Could you pull weekly trends for both desktop and mobile so we can the impact on volume? You can use 2012–04–15 until the bid change as a baseline.

— Thinking We want to select and filter the devices to count the number of times when device_type is ‘desktop’ or ‘mobile’, then cast and group a time series by year and week to see the week_start_date. Finally, check if the volume of desktop sessions went down from the 15th of April and 9th of June. This is our expected results’ table:

Image by author.
Image by author.
USE mediumcompany;
SELECT 
MIN(DATE(created_at)) AS week_start_date,
COUNT(DISTINCT CASE WHEN device_type = 'desktop' THEN website_session_id ELSE NULL END) AS desktop_sessions,
COUNT(DISTINCT CASE WHEN device_type = 'mobile' THEN website_session_id ELSE NULL END) AS mobile_sessions
FROM website_sessions
WHERE created_at BETWEEN '2012–04–15' AND '2012–06–09' 
AND utm_source = 'gsearch' AND utm_campaign = 'nonbrand'
GROUP BY YEAR(created_at), WEEK(created_at)

— Output

Image by author.
Image by author.

— UnderstandingWe will start by selecting device_type, and cast the year, the week and the date.

USE mediumcompany;
SELECT
device_type,
YEAR(created_at) AS yr,
WEEK(created_at) AS wk,
DATE(created_at) AS dt

FROM website_sessions
WHERE created_at BETWEEN '2012–04–15' AND '2012–06–09' AND utm_source = 'gsearch' AND utm_campaign = 'nonbrand';
Image by author.
Image by author.

Next, filter the website_session_id regarding the device_type (desktop or mobile).

USE mediumcompany;
SELECT
device_type,
YEAR(created_at) AS yr,
WEEK(created_at) AS wk,
DATE(created_at) AS dt,
CASE WHEN device_type = 'desktop' THEN website_session_id ELSE NULL END AS desktop_session_id,
CASE WHEN device_type = 'mobile' THEN website_session_id ELSE NULL END AS mobile_session_id
FROM website_sessions
WHERE created_at BETWEEN '2012–04–15' AND '2012–06–09' AND utm_source = 'gsearch' AND utm_campaign = 'nonbrand'
Image by author.
Image by author.

Next, we’ll drop the column ‘device_type’ since we want to aggregate and count the number of times when device_type is ‘desktop’ or ‘mobile’.

Don’t forget to GROUP BY YEAR ‘yr’, WEEK ‘wk’ and DATE ‘dt’ for the count to make sense.

USE mediumcompany;
SELECT
YEAR(created_at) AS yr,
WEEK(created_at) AS wk,
DATE(created_at) AS dt,
COUNT(CASE WHEN device_type = 'desktop' THEN website_session_id ELSE NULL END) AS desktop_sessions,
COUNT(CASE WHEN device_type = 'mobile' THEN website_session_id ELSE NULL END) AS mobile_sessions
FROM website_sessions
WHERE created_at BETWEEN '2012–04–15' AND '2012–06–09' AND utm_source = 'gsearch' AND utm_campaign = 'nonbrand'
GROUP BY 1,2,3
Image by author.
Image by author.

Finally, drop ‘yr’ and ‘wk’ from the SELECT statement (but still grouping by them) for a thicker granularity.

For the DATE column ‘dt’, cast and set the date by the first (minimum) day, grouped by YEAR ‘yr’ and WEEK ‘wk’, so that the minimum day is equal to the first day of each grouped week.

Don’t forget to GROUP BY YEAR ‘yr’ and WEEK ‘wk’ for the count to makes sense.

USE mediumcompany;
SELECT
MIN(DATE(created_at)) AS week_start_date,
COUNT(CASE WHEN device_type = 'desktop' THEN website_session_id ELSE NULL END) AS desktop_session_id,
COUNT(CASE WHEN device_type = 'mobile' THEN website_session_id ELSE NULL END) AS mobile_session_id
FROM website_sessions
WHERE created_at BETWEEN '2012–04–15' AND '2012–06–09' AND utm_source = 'gsearch' AND utm_campaign = 'nonbrand'
GROUP BY YEAR(created_at), WEEK(created_at)
Image by author.
Image by author.

Answering the question "Pull weekly trends for both desktop and mobile so we can see the impact on volume? You can use 2012–04–15 until the bid change as a baseline.", the traffic volume for __ gsearch nonbrand desktop has increased from the 15th of April up to date.

Conclusion

Traffic source analysis is about understanding where the customers are coming from and which channels are driving the highest quality traffic.

Analyzing for bid optimization is about understanding the value of various segments of paid traffic, so that we can optimize the marketing budget.

In a marketer perspective, the goal is to generate volume and drive more traffic to the website and make more money to the business.

On the analytics side, the mission is to analyse the traffic sources and bid optimization to understand the value of various segments of paid traffic, improving the marketing budget.

15 Business Questions about Mobile Marketing Campaigns: ROAS (Return On Ad Spend)

Machine Learning: costs prediction of a Marketing Campaign (Exploratory Data Analysis – Part I)

Thanks for reading.


Related Articles