Data Studio with BigQuery: 2018's best practices

For years I’ve used BigQuery and public datasets to publish interesting articles — and now with Data Studio I can make them interactive, right here on Medium. Find here the best practices we’ve developed so far to make BigQuery and Data Studio work together.

Felipe Hoffa
3 min readDec 6, 2018

--

IMPORTANT 2019 UPDATE: You should use BI Engine between BigQuery and Data Studio

Google Data Studio is totally free — and you can use it too. It’s really easy to connect it to many different sources, including BigQuery. So that’s what I do now: I analyze data with BigQuery, I connect the results to Data Studio, I make my dashboards interactive, and I publish them on Medium.

Interactive dashboard — click to play with it: World Cup goals

But there are some challenges. As you know, BigQuery allows you to analyze huge amounts of data in very few seconds — and it only charges you for each query you run. We even have a free monthly tier for queries, so you might end up paying nothing at all. But if you connect BigQuery to a popular dashboard, then your users might end up generating a lot of queries on your behalf — and that means you can end up with a huge BigQuery bill. We don’t want that.

At Google Next 18 the Developer Advocate for Data Studio — Minhaz Kazi— and I gave a talk on the best practices we’ve developed to make our favorite products work together.

For example, the easiest way to avoid pushing new queries from Data Studio into BigQuery is to materialize the results of your queries into Data Studio. That’s right, once you have the BigQuery results or table you want to visualize, you can use the extract button to do so, and Data Studio will offer you an easy way to keep and refresh a copy of this data.

But right now that feature is still in development, and it will be subject to certain limitations — so we also shared some other options to create an inexpensive layer between Data Studio and BigQuery.

For example, you can use Google Apps Script to schedule queries, and have the results pushed to Google Cloud Storage as .csv file. You can then connect Data Studio to that file, which will be refreshed based on your schedule, and your users will be able to quickly explore the results.

Predicting when will Stack Overflow reply

But this works only up to certain size of results. In other cases, it might make a lot of sense to use a database that supports lookups by key. Once you know that your visualization will only require key lookups to retrieve data, instead of scanning a table in BigQuery, you can connect Data Studio to a traditional database and just get the individual rows out of it. In that case, you can exported the BigQuery results to a MySQL database hosted on Cloud SQL, add the required indexes, and get quick inexpensive results.

Reddit kept getting slower throughout 2018. Source: Crux.

But sometimes you might have more complex needs — for example when you want to make a data source available to many users, and let them build their own customized visualizations. In that case you can create a Data Studio community connector with Apps Script, and design your own caching logic within it.

In the full talk you can see this in action with Minhaz’s example — where he created a connector that anyone can use with just a couple clicks, and they get a custom and customizable visualization — with fresh data hosted in BigQuery — without any need of coding — SQL or otherwise.

In summary — if you want to connect BigQuery and Data Studio to create an interactive dashboard — and you want to make this dashboard available for everyone — remember to think about what kind of layer you want to have between BigQuery and Data Studio. And stayed tuned, because at Google we are always thinking about how to make this connection even more powerful and awesome.

IMPORTANT 2019 UPDATE: You should use BI Engine between BigQuery and Data Studio

--

--

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.