Photo by Isaac Smith on Unsplash

Google Data Studio for Beginners

Make incredible visualizations in minutes with one of the simplest free tools available

Andrew Mooney
Towards Data Science
9 min readMar 16, 2021

--

As every writer knows, a picture is worth a thousand words; and in the field of data science, a good visualization is worth its weight in gold. As someone still incredibly new to this industry and skillset, I can’t speak from a place of expertise, but as a neophyte data engineer, I arrived at my new company with some pandas in my pocket and the bright city lights in my eyes. My team had been collecting a wealth of fantastic, mostly clean data…but nobody had the time or expertise to transform it into something actionable. What they had discovered in their quest for understanding was perhaps the greatest tool for fledgling data scientists: Google Data Studio.

When I first began playing with its interface, I scoffed and huffed. It wasn’t Pandas, it wasn’t Seaborn, and I’d just spent hundreds of hours learning how to use my fancy new Python skills in the work place so I was determined to do so. That said, the more I worked with it, the more I came to understand the truly powerful nature of Google Data Studio, or GDS, as the cool kids call it.

Photo by Franki Chamaki on Unsplash

SHOW DON’T TELL

Everyone likes graphs. Seriously, they do. Even people who hate data like a good graph. Does the line go up? How pretty is it? But most importantly, does the line go up?

I’m kidding (mostly), but in the business world, our job isn’t to play with data, it’s to translate it for those who don’t. Time and time again, you’ll find that a pretty graph will beat a number any day. Not only that, but when you work with a team that gobbles up metrics like candy, you’ll need to provide a multitude of graphs as as fast as possible. Because of this, Google Data Studio is your gal Friday.

Before we get into the power of this program, it’s important to know that it isn’t for everyone. It’s a tool that bridges the gap between data novices and data scientists. Can it do machine learning? No. Should you use it for data cleaning? Dear god, no. Even merging has its limitations. But anyone who has coded a stacked bar chart from scratch will almost be offended at how easy it is to dump a beautiful and accurate chart into a report in less time than it takes to fry an egg (if your data’s clean, of course!).

Where Google Data Studio excels is in the utterly malleable nature of its data manipulation. Every numeric column has a customizable default aggregation, so jumping from detailed rows to a grouped sum is easier than saying ‘Pivot table, schmivot table’. Not only that, but it makes data exploration a drag-and-drop dream. Finally, and most importantly, it is one of the few free programs that allows you to privately share data with your team without having to hassle with hosting it on a closed-access website. I am fully aware of the power and customizability of Streamlit, and I still plan to use that program extensively in the future, but for sharing sensitive data with my team, GDS has been a god-send. I can say, in no small terms, GDS has transformed the way my team works. But why tell you, when I can show you?

Photo by airfocus on Unsplash

HOW TO MAKE A GDS REPORT

The first you’ll notice is how shockingly easy it is to create a fully populated and interactive report within minutes. Remember, GDS is an aggregation tool, so make sure your data is clean and consistent. As the adage goes: “trash in equals trash out,” so you should never expect the fine folks at Google to do the real data engineering work for you.

Step 1: Load your Data

There are several options for loading your data. GDS can handle null values pretty well, but make sure you understand your nulls before you do so as well as the semantic configurations of your columns. One of the most common frustrations when you start is realizing your configurations are all over the place and your data breaks when it comes across a value it doesn’t understand.

GDS has several convenient options for loading your data, though all of them have their pros and cons. Be sure to read your documentation before connecting your data because you won’t notice row-loss until it’s way too late. Data connectors include:

  • PostgresSQL/MySQL (limit of 100k rows per query)
  • Google Sheets (data limits without a premium account)
  • CSV File upload (max dataset size is 100mb)
  • Google Big Query (I have never used)
  • Google Analytics/Google Ads (limited to the last three months without the paid version)
  • And several more Google products

My personal favorite is a Google Sheets. You can connect the two programs in seconds and GDS will treat the sheet as an active Database, so you will see changes reflected in your reports whenever you hit the refresh button at the top of the report page. With most of my reporting, I go with file uploads so I can have as many rows and columns as I please without getting close to the 100mb limit.

I created a fake dataset to demonstrate how easy it is to load and visualize data. I can tell you emphatically that it took me longer to come up with the fake company names than it did the create my GDS report. My data includes a combination of data types: categorical strings, integers, and dates. Below you can see me connecting this data to GDS and create a table within seconds:

There is nothing more to it than that. I will say, you should ALWAYS check the semantic configuration of your data before manipulating, which I did after I uploaded the data.

Step 2: Visualize

Let me show you how mind-bogglingly easy it is to visualize in GDS. Here’s me adding a stacked bar chart to my report and then changing the dimensions so we can see the breakdown of contact methods per account.

It’s as simple as that! Every chart pulls data from columns (fields in green) while aggregating over metrics (fields in blue). If you want to sum over a a numeric column, just drag it into the metrics and select sum (or average, median, max etc.). For object columns, you have the option of count or count distinct. Finally, there is a specific dimension specifically for date ranges, and that’s where GDS gets really fun.

Step 3: Filters

Anyone can give their team a powerpoint with ready-made graphs, but GDS goes one further. The program is designed to be fully interactive for all team members and that interaction takes the form of filters. In most programs, if you had 3 sales teams, you’d need to make a different report for each team, but in GDS you can create filters to change the report at the click of a button. Filters can control any column in your data.

3A. Filter Control Boxes

You can see the basic interactive filters available on the left. I typically use Drop Down for almost everything, I encourage you to play around and see what works. For every filter, you pick which column it will control and then the metric you want displayed to show what differentiates the column. Let’s say I want to only see accounts for a sales team and the metric I want to see is how many different accounts are in that category. All you have to do is select ‘Account Name’ as your metric and then set aggregation to Count Distinct. And that’s it! Here’s a quick how to:

You can even control which filters affect which charts by grouping report elements. Without grouping, a filter will affect anything with a shared data source, but with grouping you can create complex reports that are still user friendly. You also have the option to apply fixed filters on any report object, with the option to apply said filters to entire groups en masse.

3B. Date Controls

One of the most useful elements of GDS is its live-reporting capabilities. When you connect a Google Analytics data source, it will pull live data directly from your website and aggregate it based on your report. Granted, some data connectors are far faster than others (CSV file uploads tend to be the fastest in my experience, but then data updates are manual). That said, if you’re keeping an eye on longitudinal data, date ranges might be important. GDS is very adept at parsing date columns automatically, but, even if you’re having trouble with a column, you can always create a custom field using the PARSEDATE() function. I won’t get into custom fields now, that will be for another post, but rest assured, if you pass either a datetime column or an object column in the format ‘YYYY-MM-DD’ GDS will translate this just fine, even if there are nulls present.

Date controls affect a specific data field in every chart and field. No matter what object you place on your report, they all have a ‘Date Range’ option. You can also apply date range to groups as well. From there, you can set default date ranges that can be as simple as the last week or as complicated as 47 days before last Wednesday.

3C. Apply Filter

My last little filter pitch is a genius button labeled ‘Apply Filter’ on all chart objects. By switching on this button, you can click a row on a table or a column on a chart and the report will automatically filter based on that object. Check it out:

It’s that easy! Hopefully you can see now how many tools are at your disposal in this free, out-of-the-box package.

CONCLUSION

Google Data Studio has changed my life as the sole data guy on my team, but it certainly has its limitations. I plan on writing a lot more about both the strengths and drawbacks, as well as some helpful how-tos that I discovered while blending data. Here is a quick overview of what makes GDS so powerful:

PROS

  • Incredibly easy to use (even for non-data people)
  • A wealth of data connectors with solid documentation
  • FREE
  • Excellent for private data sharing within teams
  • Interactive reports that are very customizable
  • Uses same field functions as Google Sheets/Excel and provides docstrings for easy reference
  • Beautiful dynamic charts
  • The filters are worth the price of admission alone
  • No coding necessary

CONS

  • You have to know your data before you force it into a report
  • There are limitations on upload sizes without upgrading to Google 360 (which costs $150k a year!)
  • No technical support outside of documentation and forums
  • Merging data sources has EXTREME limitations (i.e. you cannot join multiple data sources using multiple join keys. ALL tables must have a common join column)
  • Sometimes it just breaks and the error messages are cryptic at best

Overall, Google Data Studio feels like Data Science-Lite, but outside of Streamlit for teams (which is still in beta) I haven’t found a better, easier option for sharing beautiful, complex, interactive reports with non-data teammates.

I’ll post more how-tos in the future such as the following:

  • Building a Data Pipeline using the Google Sheets api
  • Best Data Communication Practices for non-Data Teammates
  • Data Blending

Thanks, friends! Leave comments if you have questions or thoughts and I’ll do my best to answer.

--

--

Burgeoning Data Scientist and writer with a passion for good data and bad stock photos