The world’s leading publication for data science, AI, and ML professionals.

Utilizing Azure File Share to share datasets between multiple machines without the need to…

And also creating a robust pipeline to move data from AWS S3 into Azure File Share using Azure Data Factory

And also creating a robust pipeline to move data from AWS S3 into Azure File Share by using Azure Data Factory

Photo by Lorenzo Herrera on Unsplash
Photo by Lorenzo Herrera on Unsplash

Motivation:

There has always been a problem in the field of machine learning when we have multiple VM’s for training purposes and to train we have to download all the files in each VM. This would take up a lot of space in VM where we have to attach large hard drives for the same datasets that reside in it. Azure File Share overcomes this problem by sharing the storage drive across multiple VM’s using industry-standard SMB protocol. I will also write how to move data from AWS S3 directly into the Azure File Share. So without further adieu, let’s get started.

Pitfall 1: There is also an NFS protocol supported on Azure Blob Storage but it is in preview: which means that you shouldn’t use it in production but you are free to use it for testing purposes.


Prerequisite:

  1. An Azure Account with Services like Azure Storage Account and Azure Virtual Machine enabled.
  2. (Optional) Azure Data Factory Service and An AWS Account with S3 service.

Azure File Share and VM:

Everything in the Azure must be created inside the Azure logical container called Azure Resource Group. These resource groups help us to contain everything inside a container for easy access for further work in the workplace. So, let’s create an Azure Resource Group. Search for Resource Group and then click on it.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 1. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 1. 2021. JPEG file.

After clicking on the ‘New’ button you will be redirected to the new page. Enter the basic information and click on ‘Review + create’.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 2. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 2. 2021. JPEG file.

You have created a resource group. Now it’s time to create an Azure Storage Account that contains the Azure File Share and two Azure Virtual Machines. Search for Azure Storage Accounts.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 3. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 3. 2021. JPEG file.

After clicking on it, fill in the basic details and hit ‘Review + create’.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 4. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 4. 2021. JPEG file.

Do the same thing for Virtual Machine as well.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 5. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 5. 2021. JPEG file.

You need to repeat this process two times if you want to create two VM’s and test them. Creating a single VM is also fine as well. After you are done inputting the basic details, hit ‘Review + create’

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 6. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 6. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 7. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 7. 2021. JPEG file.

Now, go to your storage account and select File Shares. Create one file share by adding these basic details and hit ‘Create’.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 8. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 8. 2021. JPEG file.

Add one file inside the file share for testing purposes. The SMB protocol works on port 445 so we need to open the port on our VM’s. Let’s go to the VM’s that you just created and then select Networking.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 9. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 9. 2021. JPEG file.

Add the inbound rule of 445 on both of your VM’s by hitting on ‘Add inbound port rile’.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 10. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 10. 2021. JPEG file.

Fill in the details as above and then hit ‘Add’. Now, open up your VM cause it’s time to mount the file share into the VM.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 11. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 11. 2021. JPEG file.

Now enter these commands into your VM.

On the line ‘az login’ you will have to authenticate your VM to access the storage account. There are many ways to do it but ‘az login’ is the quick and easy way. After you enter the command, a prompt will show up

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 13. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 13. 2021. JPEG file.

Go to the browser where you logged in to the portal and paste the link and then enter the code.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 14. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 14. 2021. JPEG file.

After you do that you will be authenticated and now will be able to type in the rest of the commands.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 15. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 15. 2021. JPEG file.

As you can see you have the files that you wanted. To test it create a file and check on the storage account. For me, I will create a CSV file called ‘ability_name.csv’ on the VM.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 16. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 16. 2021. JPEG file.

As you can see that the file is created and is displayed here. we have done it. We have created a file share and shared it with two of our VM’s.


Azure Data Factory and AWS S3 (Optional):

S3 is really cheap so that’s why most of the data reside there but you have your production workload on Azure. Azure Data Factory is the perfect tool to create a pipeline between their two services to move data. So, let’s jump into it by typing ‘Data factories’ and creating the service.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 17. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 17. 2021. JPEG file.

Add these basic details and hit ‘Review + create’.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 18. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 18. 2021. JPEG file.

Go to the newly created service and click on ‘Author & Monitor’

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 19. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 19. 2021. JPEG file.

Click on ‘Copy Data’

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 20. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 20. 2021. JPEG file.

Give it a name and put the rest as default.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 21. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 21. 2021. JPEG file.

Click on ‘Create new connection’ and then select ‘Amazon S3’

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 22. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 22. 2021. JPEG file.

Now, let’s head over to the S3. Open the AWS Management Console on the separate tab and then go to IAM users and add a user.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 23. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 23. 2021. JPEG file.

Give it a name and allow programmatic access.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 24. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 24. 2021. JPEG file.

Select ‘Attach existing policies directly’ and select ‘AmazonS3FullAccess’ and Create the User.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 25. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 25. 2021. JPEG file.

Now you will be prompted into the new page with your Access key ID and Secret access key. Copy these two keys in a safe place because it is the only time you will be able to see them.

Now select a bucket to copy files from S3 to file share. I already had a bucket called ‘copytofileshare’ which has multiple CSV files. We will be copying this bucket into the file share.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 26. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 26. 2021. JPEG file.

Go to your azure tab and then add those key id and secret access key in the blanks and hit ‘Test connection’. You should be able to connect with your S3 account.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 27. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 27. 2021. JPEG file.

After you have tested successfully, hit ‘Create’. Now on the next prompt, you select the bucket that you want to copy.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 28. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 28. 2021. JPEG file.

Now we have to create the destination. Select ‘Azure File Storage’ and then hit ‘Continue’

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 29. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 29. 2021. JPEG file.

Select your subscription account and test your connection.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 30. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 30. 2021. JPEG file.

Hit next leaving the default values and you can see that the pipeline will run immediately.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 31. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 31. 2021. JPEG file.

Now check your storage account as well as your VM if those files are copied or not.

Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 32. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 32. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 33. 2021. JPEG file.
Shrestha, Sulabh. Utilizing Azure File Share to share datasets between multiple machines without the need to download for each VM 33. 2021. JPEG file.

Conclusion:

We have done it. We successfully moved data from AWS S3 into Azure File share and used that file share to provide files into the Azure VMs. Now those VM’s can utilize those CSV data and train on them without even downloading from either Azure file share or AWS S3.

This is just an example of loading the files from S3 to FileShare. The Data Factory has many services from where you can get your data to be uploaded into the FileShare. Still, the choices are endless and it’s up to you for what you want to make it. If you encounter any problems or have difficulty following the steps, comment below on this post or message me at [email protected]. You can also connect with me on Linkedin and GitHub.


Resources:

[1] Mounting File Share: https://docs.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-linux

[2] Azure Data Factory: https://docs.microsoft.com/en-us/azure/data-factory/


Related Articles