The world’s leading publication for data science, AI, and ML professionals.

Exploratory Data Analysis with Azure Synapse Analytics

Build a Power BI Report on Parquet files with the Serverless SQL Pool

Photo by Clay Banks on Unsplash
Photo by Clay Banks on Unsplash

Last week, 3th of December, Azure Synapse Analytics was announced to be General Available. Another reason to have some fun with a few of the new features in the Azure Synapse Workspace. Like building a Power BI report with T-SQL without provisioning a ‘traditional’ SQL Server instance and building a data pipeline.

In this blog, we will be using the Serverless SQL Pool included in the workspace, to create just that. Some of the benefits of the Serverless SQL Pool:

  1. Exploratory Data Analysis with T-SQL on your storage account
  2. Connectivity with tools like SSMS and ADS
  3. You are only charged for the data processed

To know more about the Serverless SQL Pool, be sure to check the official documentation. A link is in the references at the end of this blog.

In the end, I will also provide an example of the ‘create external table as select‘ statement. This is extremely useful if you need to reuse the results of your data analysis. It allows you to export the data to your storage account in the desired format.


Overview of actions

So how did I do this? What steps are needed to build the report:

  1. Provision a Synapse workspace
  2. Have data in a storage account
  3. Create views in the database
  4. Import the views in Power BI desktop

Maybe this is a little blunt, but I promise it won’t be difficult. We will go through the steps to take!


Provision a Synapse workspace

In this case, I used the Azure portal to create a new workspace and used all of the standard settings for creation.

  1. Navigate to the Azure Portal
  2. Click ‘Create a resource’
  3. Type ‘Azure Synapse Analytics’
  4. Fill in the ‘Basics’ tab and click ‘Review and Create’
Create an Azure Synapse Workspace (Image by Author)
Create an Azure Synapse Workspace (Image by Author)

Have data in a storage account

This is could be the hardest part. In the examples, I use an export of the AdventureWorksLT sample database in Parquet format. If you do not have parquet files, CSV will do the job as well.

You could place your files in the storage account that is created with the Synapse workspace. However, if you would like to use an existing storage account from the same tenant, that is also possible. Just make sure you assign the right roles to the storage account:

Connect your Storage Account to the Synapse Workspace

In case you are not using the created ADLS Account to read data from, a ‘Linked Service’ needs to be configured to the Storage Account. One way to do this is by navigating to the ‘Manage’ tab on the left side. Choose ‘Linked Services’ and click ‘New’. Choose the appropriate storage type and fill out the details.

Create Linked Service (Image by Author)
Create Linked Service (Image by Author)

Authentication via Managed Identity is recommended.

  1. Assign the Synapse Workspace (system assigned) Managed Identity at least the ‘Storage Blob Data Reader’ role on the storage account via the ‘Access control (IAM)’ tab
  2. Assign your own Azure Active Directory user at least the ‘Storage Blob Data Reader’ role.

Being ‘Owner’ or ‘Contributor’ on the storage account is not sufficient. Not having this specific ‘Storage Blob Data’ role will result in permission errors when trying to query the files.

Storage Account Role (Image by Author)
Storage Account Role (Image by Author)

Create views in the database

Before we can create views we need a database to create the views on. To do this, open the Synapse Studio via the Azure Portal

Azure portal Synapse Resource (Image by Author)
Azure portal Synapse Resource (Image by Author)

Navigate to the ‘Data’ tab. Click ‘Add new resource’ and choose ‘SQL Database’. In the options pick ‘Serverless’ for pool type and fill out a fitting name for your database.

Create SQL Database (Image by Author)
Create SQL Database (Image by Author)

Now we are ready to create a view in the database. In the example below I wrote a join between ‘SalesOrderHeader’ and ‘SalesOrderDetail’.

Navigate to the ‘Develop’ tab. Click ‘Add new resource’ and choose ‘SQL Script’. Add and execute your script. To verify if the view works navigate to the ‘Data’ tab, select your create database, views and under ‘Actions’ choose for ‘Select Top 100 rows’.

Select TOp 100 rows (Image by Author)
Select TOp 100 rows (Image by Author)

Import the views in Power BI desktop

Almost there! Open Power BI Desktop and choose ‘SQL Server’ as your data source. The server name can be found in the Azure Portal. It will likely be

_-ondemand.sql.azuresynapse.ne_t

Choose your created view and load the data!

Load data in Power Bi Desktop (Image by Author)
Load data in Power Bi Desktop (Image by Author)

Big datasets

Be aware of the limitations of Import and Direct Query mode. In another example, I made a view of the NYC Taxi Open dataset. And loaded it to Power Bi with the Direct Query mode to prevent the .pbix file from blowing up with the 1.572.000.000 rows.

Power BI Desktop with NYC Taxi Dataset (Image by Author)
Power BI Desktop with NYC Taxi Dataset (Image by Author)

Create External Table As

Another great feature of the Serverless SQL Pool is the use of the ‘Create External Table As Select’ statement. Basically, it does two things:

  1. It exports the results to Hadoop or Azure Blob Storage
  2. And in parallel, it creates an External Table in your database

To be able to use this statement you need to have a few things in place:

  • A database master key for encryption
  • A database scoped credential for authentication
  • An external data source to store and read the data
  • An external file format to determine how to store the file


Wrap Up

So my goal was to show you that it is fairly easy to set up an Azure Synapse Workspace and use the Serverless SQL Pool to explore data in the Data Lake. The advantages are many, among them:

  • The ability the use T-SQL
  • Connectivity from familiar tools as SSMS and ADS
  • Very cost-efficient as you pay per TB of processed data

References:

Microsoft documentation:

Azure Synapse Analytics | Microsoft Azure

New version of SSMS and ADS tools for serverless SQL pools in Azure Synapse Analytics | Azure updates | Microsoft Azure

Serverless SQL pool – Azure Synapse Analytics | Microsoft Docs

Set or Change the Column Collation – SQL Server | Microsoft Docs

CREATE EXTERNAL TABLE AS SELECT (Transact-SQL) – SQL Server | Microsoft Docs

Learning links:

Tutorial: Use serverless SQL pool to analyze Azure Open Datasets in Azure Synapse Studio – Azure Synapse Analytics | Microsoft Docs

Realize Integrated Analytical Solutions with Azure Synapse Analytics – Learn | Microsoft Docs


Related Articles