Snowflake S3 integration

Access files stored in S3 as a regular table in Snowflake

Amit Singh Rathore
Towards Data Science
4 min readAug 23, 2021

--

Snowflake integration objects enable us to connect with external systems from Snowflake. In my last blog, we went through how to create an API integration to invoke an AWS Lambda. In this blog, we will see how we can integrate AWS S3 so that we can access data stored and query it in snowflake.

Image by author

Implementation Details

AWS IAM Role

For snowflake to be able to talk to our AWS account we need to add cross account role. This role will be assumed by an IAM identity in Snowflake account and perform actions in our AWS Account.

  • Create a cross-account IAM Role in your AWS account. Put a dummy account and we will update the trust policy later.
  • Attach an inline policy to the role created above.

The above policy allows the role to access S3 bucket under a prefix. As a security practice only add minimum access needed by snowflake.

Snowflake Integration Object

  • Create a Storage integration object

The above command when run in snowflake creates an S3 integration object. This object creates the IAM identity in snowflake account. All API action in AWS account will be performed by this user by assuming the role we mentioned in the storage_aws_role_arn.

As mentioned, the integration object creates a user and external identity. We need those to add trust relationship between AWS and Snowflake account. To get that we describe the integration object just created and note down the values of STORAGE_AWS_IAM_USER_ARN & STORAGE_AWS_EXTERNAL_ID

OUTPUT:

image by author

Once we have the user & external identity we Update the trust relationship of the role created in 1st step. Add the value of STORAGE_AWS_IAM_USER_ARN in Principal & STORAGE_AWS_EXTERNAL_ID under condition

We need to tell snowflake of any special formatting we are having in the files which are stored in the S3 bucket. This is only needed we snowflake provided file formats are not enough.

  • (Optional) Create file format

Snowflake External Stage

Once we have the access and format setup done, we create the stage. Stage stores the metadata of the external files, in our case s3. This is used to find the data which needs to be loaded in the snowflake table(s). We have created a simple stage, you can also look at other options like encryption.

  • Create External stage

The above command creates a mapping between snowflake and S3 file prefixes. It also tells snowflake to use a file format which is suitable for data stored in S3.

  • Create External table

Similar to how stage store info on where to find data, External tables store file-level metadata, such as the filename, a version identifier and related properties. This adds an abstraction, which allows us to query data as if it was present in snowflake.

Checking Results

Now we are ready to query the data from S3 in snowflake. We issue a select statement on the table we created.

And voila, we get the result which resonates with the content of s3 data files.

In this blog, we saw how we can access and query data stored in S3 from snowflake.

Happy Snowflaking!!

--

--