The world’s leading publication for data science, AI, and ML professionals.

How much data is missing from your Google Analytics dashboard?

Discovering missing data with a self-hosted analytics platform.

Photo by Myriam Jessier on Unsplash
Photo by Myriam Jessier on Unsplash

Ad blockers are becoming more and more prominent as many websites are employing smarter advertising methods. A study from Q2 of 2020 showed that over 40% of internet users in various age groups use some form of ad blockers.

Ad blockers don’t just block ads, they can also protect the user’s privacy by blocking tracking tags such as Facebook and Google Analytics. Apps such as uBlock Origin have tracking protection enabled by default while other apps like Adblock Plus make it easy to turn that setting on. This presents a potential issue for developers looking to analyze and understand user data as they might not be looking at the complete picture.

Measuring the missing data

I am currently maintaining two web-based applications, each with 5k+ monthly users, that I will be using to test the missing data. For this experiment, I set up a concurrent Analytics service with a self-hosted analytics solution called Umami. It is a private, open-source alternative to Google Analytics that you can learn more about here. Since Umami is self-hosted, ad blockers will not prevent analytics from being collected. This will give us a complete view of our data. I tested this with a few of the most popular ad blocking software available and confirmed that none of them prevented Umami from recording data.

After setting up my Umami instance with Vercel and Digital Ocean and leaving the sites for a month, I compared the figures reported by both services.

Comparing the results

Before looking at the numbers reported, it would be helpful to understand the context of the traffic coming to both websites since the popularity of ad blocking software differs between demographics.

Website #1

The first website to be analyzed is a web-based Reddit client that uses the Medium interface. According to Google Analytics, about 95% of traffic to the site comes from organic searches. This is likely from users who want to browse Reddit but prefer a different interface.

Image by author. Weekly page view and user data for the first website: Reddium
Image by author. Weekly page view and user data for the first website: Reddium

Looking at the page views, users, and bounce rate, we see that there is a difference between the two services. Umami recorded 19% more weekly users and 23% more weekly page views. This is a noticeable difference that can not be attributed solely to the different method, causing developers to experience different traffic loads to the analytics they see.

To confirm this discrepancy, we can try comparing demographic data for both services. We can expect to see less desktop and Firefox users on Google Analytics due to the popularity of desktop ad blockers and the enhanced tracking protection on Firefox.

Image by author. User device data for the first website
Image by author. User device data for the first website

From the device data, it is clear that desktop users are the ones missing from our Google Analytics data. The numbers also show this trend, with the number of mobile users staying around the same in both services while the number of desktop users was 40% higher.

Image by author. Umami (left) and Google Analytics (right) user browser data for the first website
Image by author. Umami (left) and Google Analytics (right) user browser data for the first website

In the browser data, we can see that Firefox users almost tripled between Google Analytics and Umami while other browsers saw slight increases. This confirms that the difference in numbers between the two services can likely be attributed to ad blockers.

Website #2

The second website is a music quiz that connects with your Spotify account. According to Google Analytics, the traffic acquisition is equally split between direct, referral, organic search, and social, each source having 20–25%

Image by author. Weekly page view and user data for the first website: Whisperify
Image by author. Weekly page view and user data for the first website: Whisperify

The user and page view data shows the same trend as the first website, with Umami recording about 30% more page views and users. The difference in page views follows a similar pattern, and we can see that the device and browser data confirms it further.

Image by author. User device and browser data for Umami (left) and Google Analytics (right)
Image by author. User device and browser data for Umami (left) and Google Analytics (right)

Observations

Our comparisons show that Google Analytics is missing between 15 to 25 percent of valuable user data. This percentage is likely to increase in the future as more people are familiar concerned about tracking protection, but it is a good number to keep in mind when collecting user data.

Of course, this is a small sample of websites with very specific target audiences. Both websites are associated with social apps, such as Reddit, where the target demographics are likely to use ad blockers, so the missing data for a website aimed at the general public is possibly less that we observed.

Websites aimed at different platforms can also affect how much data is missing from Google Analytics. If your target demographic is mobile users, you probably only be missing little amounts of data compared to websites aimed at desktop users who use Firefox.

Summary

In this piece, we:

  1. Identified a self-hosted alternative to Google Analytics
  2. Compare analytics data between both analytics services for two websites
  3. Identified user segments more likely to be missing from Google Analytics (Desktop and Firefox users)
  4. Looked at potential biases in the observations and situations where Google Analytics might have more complete data

The results were pretty informative. For most purposes, Google Analytics should be able to report trends for around 80% of users. However, there are certain situations that might lead to more or less data being collected. With the increase in popularity of ad blockers, a self-hosted alternative like Umami might be the way to go as it can collect all available data and also give you better control over your own data.

Resources


Related Articles

Some areas of this page may shift around if you resize the browser window. Be sure to check heading and document order.