
Snowflake integration objects enable us to connect with external systems from Snowflake. In my last blog, we went through how to create an API integration to invoke an AWS Lambda. In this blog, we will see how we can integrate AWS S3 so that we can access data stored and query it in snowflake.

Implementation Details
AWS IAM Role
For snowflake to be able to talk to our AWS account we need to add cross account role. This role will be assumed by an IAM identity in Snowflake account and perform actions in our AWS Account.
- Create a cross-account IAM Role in your AWS account. Put a dummy account and we will update the trust policy later.
- Attach an inline policy to the role created above.
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- S3:GetObject
- s3:GetObjectVersion
Resource: arn:aws:s3:::my-Snowflake-data-bucket/dev/gold/*
- Effect: Allow
Action: s3:ListBucket
Resource: arn:aws:s3:::my-snowflake-data-bucket
The above policy allows the role to access S3 bucket under a prefix. As a security practice only add minimum access needed by snowflake.
Snowflake Integration Object
- Create a Storage integration object
create or replace storage integration my_s3_int_01
type = external_stage
storage_provider = s3
enabled = true
storage_aws_role_arn = '<IAM_ROLE_ARN>'
storage_allowed_locations = ('s3://my-snowflake-data-bucket/dev/gold/');
The above command when run in snowflake creates an S3 integration object. This object creates the IAM identity in snowflake account. All API action in AWS account will be performed by this user by assuming the role we mentioned in the storage_aws_role_arn.
As mentioned, the integration object creates a user and external identity. We need those to add trust relationship between AWS and Snowflake account. To get that we describe the integration object just created and note down the values of STORAGE_AWS_IAM_USER_ARN & STORAGE_AWS_EXTERNAL_ID
describe integration my_s3_int_01;
OUTPUT:

Once we have the user & external identity we Update the trust relationship of the role created in 1st step. Add the value of STORAGE_AWS_IAM_USER_ARN in Principal & STORAGE_AWS_EXTERNAL_ID under condition
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
AWS: "<STORAGE_AWS_IAM_USER_ARN>"
Action: sts:AssumeRole
Condition:
ForAnyValue:StringLike:
sts:ExternalId:
- "<STORAGE_AWS_EXTERNAL_ID>"
We need to tell snowflake of any special formatting we are having in the files which are stored in the S3 bucket. This is only needed we snowflake provided file formats are not enough.
- (Optional) Create file format
CREATE OR REPLACE FILE FORMAT my_csv_format_01
TYPE = CSV
FIELD_OPTIONALLY_ENCLOSED_BY='"'
Snowflake External Stage
Once we have the access and format setup done, we create the stage. Stage stores the metadata of the external files, in our case s3. This is used to find the data which needs to be loaded in the snowflake table(s). We have created a simple stage, you can also look at other options like encryption.
- Create External stage
create or replace stage my_s3_stage_01
storage_integration = my_s3_int_01
url = 's3://my-snowflake-data-bucket/dev/gold/'
file_format = my_csv_format_01;
The above command creates a mapping between snowflake and S3 file prefixes. It also tells snowflake to use a file format which is suitable for data stored in S3.
- Create External table
create or replace external table ext_ccfinput_test_01
with location = @my_s3_stage_01/
auto_refresh = true
file_format = (type = CSV)
pattern='.*ganalytics.*[.]csv';
describe stage my_s3_stage_01;
Similar to how stage store info on where to find data, External tables store file-level metadata, such as the filename, a version identifier and related properties. This adds an abstraction, which allows us to query data as if it was present in snowflake.
Checking Results
Now we are ready to query the data from S3 in snowflake. We issue a select statement on the table we created.
select t.$1, t.$2, t.$3, t.$4, t.$5, t.$6 from @my_s3_stage_01 as t;
And voila, we get the result which resonates with the content of s3 data files.
In this blog, we saw how we can access and query data stored in S3 from snowflake.
Happy Snowflaking!!