The world’s leading publication for data science, AI, and ML professionals.

How to Connect Azure AD Managed Identities to AWS Resources

Setup secret-less access from Azure Data Factory to AWS S3

Image by Susan Q Yin on Unsplash
Image by Susan Q Yin on Unsplash

1. Introduction

A common challenge for developers is the management of credentials to secure communication between services. Azure Managed Identities eliminate the need for developers to manage these credentials. Applications can use managed identities to obtain Azure AD tokens to access resources in Azure. In this blog, it is explained how Azure Data Factory Managed Identity can be used to access AWS S3, see also below.

overview - Azure AD managed identity connecting with AWS
  1. overview – Azure AD managed identity connecting with AWS

In the remaining of this blog, it is explained how an Azure tenant is registered as Identity Provider in AWS such that managed identities can access S3. Then, as an example, Data Factory Managed Identity is used to copy data from AWS S3 to Azure Storage. This tutorial heavily relies on this blog of Uday Hegde in which Azure AD – AWS access is discussed in more detail.

2. Setup Azure AD tenant as AWS Identity Provider

In this chapter, Azure AD tenant is setup as AWS Identity Provider. In this, the following steps are executed:

  • 2.1 Create App registration in Azure
  • 2.2 Create Azure AD tenant as Identity Provider (IdP)in AWS
  • 2.3 Add role to IdP and grant access to S3

2.1 Create App registration in Azure

In this paragraph, an app registration is created. Log into the Azure portal, select Azure Active Directory and select App registration and create an App registration. After the App registration is created, an Application ID URL needs to be specified. As URI, api://aws_azure_federate can be used, see also image below.

2.1 App registration using URI scheme app://
2.1 App registration using URI scheme app://

2.2 Create Azure AD tenant as Identity Provider (IdP) in AWS

In this paragraph, your Azure AD tenant is registered as Identity Provider (IdP) in AWS. Log into the AWS console, select IAM and then select to add an Identity Provider. Use OpenID Connect as Provider type, use https://sts.windows.net/<<your Azure AD tenant id>>/ (don’t forget the ending / ) as URL and use Application ID URI as audience, see also image below.

2.2 Create Azure AD tenant as Identity Provider in AWS
2.2 Create Azure AD tenant as Identity Provider in AWS

2.3 Add role to IdP and grant access to S3

In this paragraph, a role is created in the IdP and that role is granted access to S3 buckets. Go to your newly created IdP in AWS, select create role, choose web identity and select as audience your app registration. As permissions, select AmazonS3FullAccess(in production, create a more fine grained policy). Finally, name your AzureADWebidentity3 and create it, see also image below.

2.3 Role in IdP using app reg as audience, having full access to S3
2.3 Role in IdP using app reg as audience, having full access to S3

After this step, your Azure AD tenant is registered as Identity Provider. In the next step chapter, Azure Data Factory Managed Identity is leveraged to copy files from S3 to Azure Storage.

3. Example: Use Data Factory MI to connect to S3

In this chapter, a Data Factory pipeline is used to copy data from AWS S3 to Azure Storage. In this, the following steps are executed:

  • 3.1 Create Azure Data Factory, Azure Storage Account and AWS S3
  • 3.2 Deploy Data Factory Pipeline
  • 3.3 Run Data Factory pipelines

3.1 Create Azure Data Factory, Azure Storage Account and AWS S3

In this paragraph, the required resources are created.

  • Follow this link to create a Azure Data Factory instance
  • Follow this link to create a Azure Storage account. After Storage account is created, make sure that ADF Managed Identity has Blob Storage Contributor Role to the storage account. Also create a file system on the storage account
  • Follow this documentation to create an S3 bucket and add some files to the bucket

3.2 Deploy Data Factory Pipeline

In this paragraph, the pipelines are created in your Azure Data Factory instance. The pipelines can be found in git repo below:

https://github.com/rebremer/data-factory-managed-identity-connection-aws-s3

There are multiple ways to add the pipelines to you own ADF instance, for instance:

  • Fork git repo above to your own repo and add repo to your own ADF instance, see this link
  • Use Azure CLI to deploy ARM template in git repo, see this link how to do this
  • (Quick and dirty) Create an empty pipeline named[1-adf-mi-s3-connection-noakv-nofunc-pl](https://github.com/rebremer/data-factory-managed-identity-connection-aws-s3/blob/main/pipeline/1-adf-mi-s3-connection-noakv-nofunc-pl.json) and [2-adf-mi-s3-connection-noakv-func-pl2](https://github.com/rebremer/data-factory-managed-identity-connection-aws-s3/blob/main/pipeline/2-adf-mi-s3-connection-noakv-func-pl2.json) , and a linked service called [AmazonS3_linkedservice.json](https://github.com/rebremer/data-factory-managed-identity-connection-aws-s3/blob/main/linkedService/AmazonS3_linkedservice.json) . Once created, use the corresponding JSON files in the git repo to fill the pipeline and linked service

Finally, go to parameters of the first 2 pipelines and fill them with you variables (the other pipelines won’t be used and are only there for reference), see also image below.

3.2 Successful pipeline creation, filling in parameters
3.2 Successful pipeline creation, filling in parameters

3.3 Run Data Factory Pipeline

In this paragraph, the first 2 pipelines are run. Both pipelines use ADF MI to get a temporary AWS S3 access token that is used to copy data from AWS S3 to Azure Storage. Pipelines can be described as follows:

  • Pipeline 1: Try to access S3 bucket using ADF MI, parse error message to get ADF MI bearer token, use ADF MI bearer token to get temporary AWS S3 tokens, copy data from S3 to Azure Storage
  • Pipeline 2: Same as pipeline 1, but now an Azure Function is used to bounce the ADF MI bearer token rather than an S3 error message.

Obviously, pipeline 2 is more enterprise ready, but requires an Azure Function with 3 lines of code as follows:

# Code of Azure Function:
import azure.functions as func
def main(req: func.HttpRequest) -> func.HttpResponse: 
    return func.HttpResponse(req.headers['Authorization'])

After pipelines is successfully run, data is copied from S3 to Azure Storage using the copy activity, see also image below.

3.3 Successful pipeline run, copying files from S3 to Azure Storage without credentials
3.3 Successful pipeline run, copying files from S3 to Azure Storage without credentials

4. Conclusion

A common challenge for developers is the management of credentials to secure communication between services. Azure managed identities eliminate the need for developers to manage these credentials. This can also be used to access resource from Azure AD to AWS. In this blog, it is explained how Azure Data Factory Managed Identity can be used to access AWS S3, see also overview below.

overview - Azure AD managed identity connecting with AWS
  1. overview – Azure AD managed identity connecting with AWS


Related Articles