Find Your Power Users Using BigQuery Firebase Data

Understand user journeys without investing in a third-party tool

Published in

Towards Data Science

7 min readJul 6, 2022

If you use Firebase and BigQuery, here’s how you can tap into clickstream data from iOS native app and turn it into actionable insights about user journeys. You don’t need any third-party apps for it, and you can customize your analysis for the deep dives.

Untapped Goldmine of Clickstream Data

If you’re using Firebase for a native iOS app, there is an event that’s triggered every time a user lands on an app screen. This event is screen_view, and your screens are named properly, you can extract a nice sequence of screen names, which will give you an understanding of a user journey on your app.

Google Analytics tracks screen transitions and attaches information about the current screen to events, enabling you to track metrics such as user engagement or user behavior per screen. Much of this data collection happens automatically, but you can also manually log screenviews.

Using some SQL manipulations and BI software, you can easily gain visibility of:

most and least visited screens;
most and least frequent transitions from one screen to another: you may find surprising transitions that your team forgot existed;
exit screens: you can understand where most users exist the app, and coupled with the transition analysis, you can see what happened before the exit;
cohort behavior: some users will convert, some won’t — and you’ll be able to see if there are any differences in journeys and build hypotheses around what makes your power users;
user paths and time it takes for the user to reach a milestones in their app journey: how long does it take your user to reach an intended conversion event?

While you can definitely track user journeys using 3rd party tools, there are advantages to having access to raw data — and not all the 3rd party tools are able to provide that. As someone who loves getting my hands dirty with data, I get excited when I see the data that I can manipulate.

Enriching Your App Journey Data

If you have access to data from multiple data sources in the same place, you are at a huge advantage. 3 main ways to enrich your data:

with your customer data: this will require either a unique customer identifier that can be lawfully collected when a user browses the app (more on that below);
with transaction data and feedback data: how long did it take to deliver a product, what was the NPS, was the customer retained, etc.;
other events collected from Firebase: was a user part of an AB test, which app version they used, which country they were in, etc. Other clickstream events like product views, clicks and other interactions, banner interactions, etc., can also be leveraged to enrich your screen transitions data.

Finally, analysis and visualization are within your control. You want session-level conversions? There you go. 3-day attribution? Make it happen. Want to exclude users who were part of AB tests? You can do it.

The nightmares of iOS15 and user tracking

It is worth noting that after Apple has limited advertiser id collection without explicit user consent, for Firebase data it meant empty IDFA string that’s returned for users who denied tracking. In Firebase, however, you are still geared with user_pseudo_id which remains consistent on the app install. In some cases, it is more consistent than IDFA, which is an advantage.

Many 3rd party tools switched from IDFAs to using probabilistic models to approximate user identity. Which is ironic, but it works.

Analyzing the Data

I’ll name my top-3 favorite ways to dig into this kid of data.

The easiest way to analyze this data is to build a transition matrix where in rows you have a screen, and in a column — the next screen or a null screen in case of an exit. Very similar to a concept used in Markov Chains but constructed a little more freely. Values in this case will be the % of users who moved from screen in a row to a screen in a column, or the % of transitions out of total per row. It will highlight the most common transitions that users take on, as well as most common screen exits — where screen name in a column in null. This kind of analysis will be fairly easy to do with LEAD and LAG window functions and almost any visualization tool.

For example, in the quick mockup below, blue cells are the top-3 transitions from home, (product) details, cart screens. Users most often jump to product listing, going around search. But once users are in detail page, they are most likely to go to search or to another details page. Why does it happen? Could you make PDP search more convenient for users? Why do users skip search from home page? Alarmingly, most users go from cart to product details page and not to checkout, which is not even in the picture. What could be the reason and how could we improve user behavior?

This will be a good start, and to level it up, you can aggregate all the screens a user viewed on their session or on a rolling window and compare how many of them engaged in an event of your interest. In case of e-commerce, it is often an e-commerce purchase. Your hypothesis may sound like this “users who check review screen during the session show higher conversion rate”. Building on top of a transition analysis, this will give you ground to build hypotheses about user behavior.

One of the most common ways to visualize user journeys will be using a flow chart, which allows to go beyond one transition is a path explorer, also called a Sankey diagram. Google has introduced a path explorer in GA4, and it looks quite promising. You can choose among events you would like to consider as well as screen names. DIY version will require a bit of Python skills (or any other language) to draw such charts as well as think about repeated screen transitions which happen frequently. Plotly is one of my favourite visualization packages and I think it’s not getting enough credit but it’s incredible.

A mockup path below shows a rather typical ecommerce flow from home to listing to details, however, part of home traffic goes directly to search. Most search traffic leads to product listing pages, but a good chunk of it lands users directly to a product detail page.

Andrew Chen has written a great piece on a power user curve, which I highly recommend if you’re insterested. Once you know where to look for conversion signals, take a look at how many users do this action on a given time period: day, month, 2 weeks, etc. In the original blog post, Andrew suggests showing “total active days in a month” as an X axis, and showing the % of users in corresponding buckets using a bar chart. It is very similar to a concept of a histogram and can tell you a lot about your “user quality” by showing how many users engage with your power features. You can dance around it, developing your own representations, such as “number of times a users visited a product listing page”, and so on. It’d be perfect if you can break it down by traffic source and medium to monitor changes in user composition.

Third party tools often use sunflower charts to visualize user paths. They’re fun to play with but it can quickly go out of hand when some screens are visited considerably more frequently than others or when there’s a frequent switch between 2 or more screens. For example, in an e-commerce app, you can expect the product listing page to be the most frequently visited one, and there will be a lot of switching between product listing page and a product detail page. This data is not always taken care of and may be hard to analyze.

All that said, clickstream data will likely get large if your app has any significant traffic, which you can solve with aggregations.

What It Tells You

To summarize, understanding which features are used by converting users will give you insight into your power users’ behavior. You can start building your hypotheses and test them in data deep dives, user interviews or experiments.

Apart from that, such methodology will help you understand the flipside — user journey distractions and exit points.

Building on top of that, you can start developing user quality framework — beneficial for product work as well as your marketing and CRM strategy.

What It Does Not

Be careful to avoid the causality-correlation causality-correlation trap. Correlation, or association, does not imply a causal relationship. In other words, if you find that your power users — users with highest conversion — visit a particular screen more often than others, it does not mean that directing more users to this screen will necessarily bring more conversions. While it’s not impossible, your power users can be simply more inclined to use a particular feature because of their intrinsic motivation. Therefore, you’d need to experiment, do user research and data deep dives to understand the behavior.

Resources

Track Screenviews | Firebase Documentation

Google Analytics tracks screen transitions and attaches information about the current screen to events, enabling you to…

firebase.google.com

Measure ecommerce | Google Analytics 4 Properties | Google Developers

Ecommerce reporting provides insight into the shopping behavior of your users, enabling you to quantify your most…

developers.google.com

The Power User Curve: The best way to understand your most engaged users at andrewchen

Today we have an essay on one of the common frameworks we use to analyze investments at Andreessen Horowitz: The Power…

andrewchen.com

Navigation functions | BigQuery | Google Cloud

Whether your business is early in its journey or well on its way to digital transformation, Google Cloud can help you…

cloud.google.com

Sankey

Links and nodes have their own hovertemplate, in which link- or node-specific attributes can be displayed. To add more…

plotly.com