Data lake on GCP using Terraform

Use Terraform to set up infrastructure-as-code for a Data Lake on Google Cloud Platform.

Tuan Nguyen
Towards Data Science
9 min readAug 15, 2020

--

A summarize of what we will be building in this project (image by author)

Back in the old days, dealing with physical infrastructure is a huge burden, which not only requires teams of experts to manage but also is time-consuming. In the modern cloud computing era, however, you can deploy hundreds of computers instantly to solve your problems with the click of a button. Well, to be realistic, most day to day problems that we are trying to solve won’t require that much computing power.

What is infrastructure-as-code (IaC)

Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.

Wikipedia

So instead of placing a physical server on a rack, setting up all the cables, configuring the network, installing all the required operating system, you can write codes to do all that. To use IaC, you typically would use a source code control repository, write code to configure your…

--

--