The world’s leading publication for data science, AI, and ML professionals.

Data Patterns in a Multi-cloud Future

Multi-cloud is becoming the norm. What does this mean to data science and data engineering when your data is spread across multiple clouds…

Photo by Franki Chamaki on Unsplash
Photo by Franki Chamaki on Unsplash

The most important asset for any organization is their data. But that is also the asset that’s the most challenging to manage. I’ve written how cloud computing has evolved to keep in tune with the changing dynamic of world economics and trade and why it’s imperative for every organization to have a multi-Cloud strategy. While keeping pace with changing dynamics is important for businesses to thrive and grow, it also brings in a new set of challenges.

Especially when you bring in multiple cloud platforms where your enterprise data is spread across different clouds, how does it change the data science equation? Are you still able to leverage all the data from all the clouds and on-premise to still connect the dots and extract meaningful analytics out of them?

It’s not surprising that Gartner predicted,

By 2021, over 75% of midsize and large organizations will have adopted a multicloud and/or hybrid IT strategy.

We at Google see the reality of this prediction already unfolding, as we partner with some of the largest enterprises in enabling their multi-cloud and multi-cloud driven data analytics journey.

In this post and the next, we’ll discuss different data patterns that you can adopt for your Data Science, data engineering and BI tasks with data spread across multiple clouds.

Business Intelligence by connecting data across multiple clouds

Image by author
Image by author

Looker is Google‘s multi-cloud Business Intelligence and data analytics platform. The power of Looker comes from its ability to connect to multiple data sources across different clouds or on-premise and create reports and dashboards. Its LookML has a rich set of APIs that let you embed data, visualizations and insights into your websites and to use workflow integration and build applications on top of Looker itself.

Looker can be deployed in various locations, it doesn’t have to be Google Cloud. If the majority of the data to be displayed on the dashboards is located in another cloud or on-premise, Looker is best deployed close to the primary data source.

Using BigQuery-Omni to Access Data in Multiple Clouds

Image by author
Image by author

The same control mechanism that applies to querying BigQuery managed storage can be applied to the data that resides in AWS. Another interesting pattern is using Looker together with BigQuery Omni.

Let’s take a look at how this diagram is different from the previous one. Rather than connecting from Looker to Athena, which is needed to query the data in S3 buckets, Looker connects to BigQuery. Technically it connects to the control plane and the control plane goes to the data plane residing in AWS and processes the query against the data in the S3 bucket.

The advantage of using this pattern is that the same control mechanism that applies to querying BigQuery managed storage can be applied to the data and that resides in AWS and it’s going to be a much more controlled and simpler setup of the dashboard. From the Looker perspective connectivity to BigQuery Omni is exactly the same as connectivity to BigQuery tables in managed storage or in federated data sources on Google Cloud. This can also become very handy when the data scientists or BI analysts don’t have access to Athena or some other way to query the data in S3 buckets through a SQL interface. BigQuery Omni becomes that SQL engine with the convenience and security of controlling it from the familiar Google Platform.

The rest of the previous pattern remains in place – you can still connect to other data sources, including BigQuery data on Google Cloud and on-premise data warehouse and visualize it on the same dashboard and build the same types of apps and workflows.

In the next post, we’ll discuss a few more advanced architecture patterns that can let you run your analytics spanning multiple clouds.


Related Articles