The world’s leading publication for data science, AI, and ML professionals.

One of the most underrated Applications for Data Engineers and Scientists

What you should know about Googles Data Transfer Service

Photo by Nick Gardner on Unsplash
Photo by Nick Gardner on Unsplash

Bigquery Data Transfer Service automates data transfer in BigQuery on a scheduled, managed basis. The data engineering and analytics teams can lay the foundation for a BigQuery data warehouse without writing a single line of code [1]. Personally, I see two wide fields of application which can you save additional tools and costs.

Data Integration

Sources that can be integrated by the Data Transfer Service are:

SaaS (Software as a Service) applications from Google like:

  • Campaign Manager
  • Cloud Storage
  • Google Ad Manager
  • Google Ads
  • Google Merchant Center (Beta)
  • Google Play
  • Search Ads 360 (Beta)
  • YouTube – Channel Reports
  • YouTube – Rights Holder Reports

External Cloud Storage Providers:

  • Amazon S3

Data Warehouses:

  • Teradata
  • Amazon Redshift

Furthermore, additional third-party transfers are available in the Google Cloud Marketplace [2]. The advantage here is clearly that only SaaS services, i.e. BigQuery and the DTS, are used – without building any infrastructure. In addition, only one service provider is used, namely Google. This saves a lot of construction and maintenance effort. As a result,you only pay for computing power, which of course also accrues with other solutions.

ELT Processing

Besides transferring data, which is already in the name, the service can also be used for data transformations within your Data Warehouse or Data Lake. Here, I would use the modern approach ELT instead of ETL. A short summary of ELT and ETL is, that the ETL process is loading the data into the target system after the transformation takes place within the data integration tool, the ELT approach firstly loads the data into the target system before transforming the data. Often the ELT process is nowadays preferred over the traditional ETL process because it’s simpler to realize and loads the data faster [3].

Data Integration and Transformation via DTS— Image from Author
Data Integration and Transformation via DTS— Image from Author

A possible solution architecture would be the above, where the Data Transfer Service loads the data into the Data Lake or the staging area and then takes over the transformation of the data using simple SQL.

Scheduling a Query via Data Transfer Service - Image by Author
Scheduling a Query via Data Transfer Service – Image by Author

An example for a transformation task in BigQuery is to join two tables, doing some transformation on column data types or values and saving them as a new data object for later analytic purposes. After writing the SQL and clicking on "Schedule query" you can automate it by the popup configuration manager (right side of the picture above).

DTS will give you the possibility to automate the transformation task by scheduling the query e.g. daily on a certain time (similar to batch processing and OLAP operation over night) – but much faster thanks to the scalability within the GCP.

Conclusion

This approach brings certain advantages:

  • You only have to use GCP native services (No big tool set is needed.) [2]
  • Transformation can be done by SQL (Even less technical experienced staff can implement the data preparation logic.)
  • DTS and BigQuery are full SaaS technologies (No worries about infrastructure and scaling.)
  • DTS can handle Big Data very well

Since cloud platforms provide us endless scalability and lots of services, enabled just per mouse click, you can quickly build up a Data Warehouse and the needed data pipelines. This example shows you how to set all up by only using two Google services and SQL.

Sources and Further Readings

[1] Google, What is BigQuery Data Transfer Service? (2021)

[2] Google, Drittanbieter-Übertragungen verwenden(2021)

[3] Fivetran, https://fivetran.com/_/api/blog/elt-vs-etl/elt-vs-etl.png


Related Articles