Funnel analytics with SQL: MATCH_RECOGNIZE() on Snowflake

SQL is a great tool, but it can’t do everything. For example, figuring out a way to analyze funnels and sessions with SQL is truly difficult — until now. Let’s discover the power of MATCH_RECOGNIZE(), by analyzing a Google Analytics e-commerce funnel with Snowflake.

Felipe Hoffa
Towards Data Science

--

Image: Pixabay

Check TJ Murphy’s reaction when he discovered that Snowflake now has support for MATCH_RECOGNIZE:

Sample use case: Google Analytics e-commerce funnel with Snowflake

How would you answer this funnel question with SQL:

“We have an e-commerce shop. We need to identify sessions that land into the website from an arbitrary referrer (Instagram, Facebook, etc), and that end up buying something. How many steps did it take for these users to buy something, and how many times they tried to pay before success?”

That’s not an easy question to answer with plain SQL. But MATCH_RECOGNIZE finally gives us an elegant way to find answers to question like this one.

With MATCH_RECOGNIZE we can simply define the steps we want to find within a series of events, and then we can define any pattern for these steps — and then MATCH_RECOGNIZE does the rest.

For this example we’re going to use a Google Analytics 360 dataset from an e-commerce shop. Nothing better than real data to prove the power of our tools.

Step 1: Load Google Analytics data into Snowflake

Taking data out of BigQuery and into Snowflake is easy. You might need tools to build a reliable pipeline, but the basic steps are easy:

  • Export from BigQuery to GCS: You can use your own GA data, or a sample dataset with data from the Google Merchandise store.
  • Load GA data into Snowflake: This is straightforward. Snowflake will happily ingest files from GCS, even if your Snowflake account lives in AWS. Loading JSON works like magic too. You might want to set-up authentication and permissions for maximum security, while for sample data I just used a public bucket.
create or replace table ga_demo2(src variant);copy into ga_demo2
from 'gcs://fhoffa/ga360/ga_sessions000000000000'
file_format=(type='JSON');
  • Query in Snowflake: Notice above that we loaded the JSON rows into a variant type. Snowflake makes working with semi-structured data a joy; you can easily navigate the Google Analytics schema. We can also create a view out of the basic query, to make all the following steps easier to follow:
create or replace view ga360_parsed
as
select src:visitStartTime::timestamp ts
, hit.value:hitNumber::int hit
, src:visitId::string visit
, hit.value:eCommerceAction.action_type::int action_t
, hit.value:eCommerceAction.step::int action_s
, hit.value:eCommerceAction.option::string action_o
, hit.value:page.pagePath::string path
, hit.value:referer::string referer
, src:fullVisitorId::string visitor
, src:totals.transactions::int total_transactions
from ga_demo2, lateral flatten(input => src:hits) hit

Step 2: Querying the funnel is fun

This “straightforward” query answers the question we were looking to answer:

select count(*) visits
, avg(steps) avg_steps
, avg(payment_attempts) avg_pay_attempts
, max(payment_attempts) max_pay_attempts
from ga360_parsed
match_recognize(
partition by visit
order by hit
measures count(*) steps, count(payment.*) payment_attempts
one row per match
pattern(from_plex (payment | anything)* completed)
define
from_plex as referer = 'https://mall.googleplex.com/'
, completed as path like '/ordercompleted.html'
, payment as action_o = 'Payment'
);

Turns out 21 visits landed from that specific referrer into our site, and ended up purchasing something. On average it took them 35.6 steps to do so, and 1.7 attempts to pay — with at least one session having nine pay attempts before success.

Results for our funnel question

Understanding MATCH_RECOGNIZE

You can follow the official docs to discover all the powers of MATCH_RECOGNIZE — here we’ll look at the basics of the query above.

define

Let’s start with the define section, that is used to define steps in the user’s funnel:

define
from_plex as referer = 'https://mall.googleplex.com/'
, completed as path like '/ordercompleted.html'
, payment as action_o = 'Payment

You can see that we gave an arbitrary name to each step, and that the conditions can be whatever you need them to be. One step for us looks at the referer, another one looks at the users landing on a specific path, and the payment steps are identified by the action variable that Google Analytics stores.

pattern

The pattern section is basically regular expressions for funnel steps:

pattern(from_plex (payment | anything)*  completed)

This says “look for sessions that start with the from_plex step, that are then followed by any number of payment or anything steps, and that end up in the completed state.

match_recognize

With define andpattern established, we can do a basic match_recognize query:

select classified_as, hit, visit, action_o, referer ref, path
from ga360_parsed
match_recognize(
partition by visit
order by hit
measures classifier() as classified_as
all rows per match
pattern(from_plex (payment | anything)* completed)
define
from_plex as referer = 'https://mall.googleplex.com/'
, completed as path like '/ordercompleted.html'
, payment as action_o = 'Payment'
);

We added 3 constructs to that query. First, how to split and order our sessions: partition by visit order by hit. Then an added column that will show how each row is classified in our funnel: measures classifier(). And then our desire to see all rows of each funnel, instead of just a one-row summary: all rows per match. That’s what allows us to see the following results, with the steps classified as “payment” highlighted:

Showing all rows per match

Answering funnel questions

With each funnel step identified, we can answer the opening question. Instead of selecting all rows, we can ask for one per funnel with a summary: one row per match. We add the measures we are interested in: measures count(*) steps, count(payment.*) payment_attempts. MATCH_RECOGNIZE does the rest of the job, and we can go back to our usual SQL ways to get counts, averages, and other metrics.

Next steps

Funnel analytics is just one example. Think of all the possibilities with MATCH_RECOGNIZE in your toolbox: Threat detection, finding trends within weather and stock prices trend lines, and much more.

Let’s explore these topics in future posts, and share your best ideas and results.

Read more

  • Official Snowflake MATCH_RECOGNIZE() docs.

Want more?

I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter, and check reddit.com/r/snowflake for the most interesting Snowflake news.

--

--

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.