In our previous Power BI automation blog, we utilized the Power Platform in combination with Azure SQL DB to store refresh history of our datasets leveraging the Power BI REST API when on a Power BI shared capacity. A solution to manage Power BI dataset refreshes is particularly useful when data source refresh times are variable and end users need to know when dataset refreshes are successful. However, Power Automate and Azure SQL create two failure points. This solution also requires additional licensing cost in Power Automate to leverage premium data connectors to Azure SQL. So, how can we consolidate the number of tools we use and at the same time leverage the security features and access controls that Azure provides?
Azure Synapse Analytics Has Entered the Chat
Azure Synapse Analytics provides a unified framework for analytics, data integration, storage, and security all in one user interface. In this section of the blog, we will document how we can leverage pipelines in Synapse, Logic Apps for emailing our user base that refreshes have started and completed, and Azure Key Vault to securely store all our Power BI App Client ID and secrets.
Key Tools
- Azure Active Directory
- Power BI App Registration with a Client ID/Secret
- Security Group to Add App Registration Created
- Azure Synapse Workspace
- To build our pipelines and create integration datasets to populate results to our data lake storage account
- Azure Logic Apps
- To send emails to our business users that refreshes have started and completed
- Azure Key Vault
- Securely store client ID and secrets
- Ensure your Synapse Workspace has a linked service and connection to Key Vault
- O365 Service Account
- Dedicated Service Account to perform automation of emails in Logic Apps
- Power BI Tenant Settings
- Enable Service Principals to use Power BI REST API
Figure 1: Tenant Settings + Security Group Option
Note: App Registration must be an Admin to all the Power BI Workspaces that we will be getting refresh data from.
- Other
- Knowledge of API testing and sending GET/POST requests
Logic Apps Workflow UI
We will create a base template for Logic Apps. This template will be called by Synapse via the HTTP Web request and parameters will be sent via Synapse to fill the user emails, subject line, and message.
Figure 2: Logic Apps Template
Azure Synapse: Integration Datasets + Data Lake Architecture
Before we review the pipelines set up, we need to configure the following integration datasets and linked services.
- Linked Services Needed
- Key Vault
- REST API
- Integration Datasets
- ds_PowerBIResponse: To leverage the Power BI REST API
- ds_Parameters: To access our csv file of our parameters that we will be looping over to refresh
- ds_RefreshHistory: To provide a sink point to take our results and store in our data lake
Figure 3: Integration Datasets
Our Data Lake folder set up will have the following items.
- Our Parameter Table with our Power BI Datasets and Workspace Information (GUIDs)
- A Refresh History Table where we will drop our storage information
- Will have a nested folder structure based on time of data refresh
Figure 4: Synapse Storage Setup
Azure Synapse: Master Pipeline Overview
This pipeline below is the master pipeline that will kick start our Power BI Refreshes and checking refresh status. In order, it does the following:
- Grabs our secret information from Azure Key Vault
- Gets an AAD Token by calling the Power BI REST API
- Gets our dataset GUID parameters from our flat file that houses the datasets we will need to refresh
- From here, we will start the refresh process for all the datasets and trigger the Logic Apps email to let us know the refreshes have started
- Datasets will be refreshed in parallel via the for loop. The check status pipeline will be called within this loop to ensure proper timings before sending refresh completion
Figure 5: Main Master Pipeline
Figure 6: Loop Step to Call Power BI Refresh and Call Pipeline to Check Status
Figure 7: Email Trigger that Refreshes Have Started
Azure Synapse: Check Status Pipeline Overview
Below is the check status pipeline. At a high level, this pipeline is called multiple times per dataset that is needed to be refresh and will run until all datasets either complete or fail. Some things to note are the following:
- The Reset Token step provides a failsafe when a refresh takes more than a certain amount of time to complete. The Power BI REST API token by default, is only available for 1 hour and this step allows us to get a new token once we are close to that threshold
- The Until Loop will stop only when the dataset refresh status something that is not in progress. Once it is not in progress, we will log the results as a parquet file in our data lake storage account
Figure 8: Check Status Pipeline Main Level
Figure 9: Check Status Pipeline Until Loop
Figure 10: Refresh History Logging in ALDS Gen 2
Once all the dataset refreshes are done, we will get an email that tells us the refresh is completed from our master pipeline.
Figure 11: Complete Email from Logic Apps
Conclusion and Extensions
Using Azure Synapse Analytics and Logic Apps, we were able to consolidate and reduce the number of services required with Azure Synapse’s unified framework. Data storage, and pipelining was done entirely in the Synapse environment, and we leveraged Logic Apps to orchestrate sending emails to our end users. Azure storage provides an inexpensive alternative to leveraging a standalone database by storing all our refresh history in our default Azure Data Lake Gen 2 Storage provided by Synapse.
Some extensions are also possible and are outlined below
- Pulling history information from other artifacts like dataflows
- Pulling and logging other information from the PowerBI REST API
- Adding this pipeline to a broader ETL pipeline in Synapse. Once ETL is done this pipeline can be used to refresh datasets downstream
- Creating auditing reports to monitor PowerBI refresh performance
Connect with us!
Please reach out to us today to find out more about how BlueGranite’s talented team can help you bring clarity and efficiency to your Modern Analytics endeavors today!
BlueGranite offers a variety of resources and free training events to help you learn how you can leverage Modern Data Analytics.
Please visit our website and YouTube Channel to learn more, or contact us directly to see how we can help you explore your about modern data analytics options and accelerate your business value.
About the Authors
Nithin Sankar
Nithin is a Solution Consultant at 3Cloud with experience working with stakeholders in the healthcare, hospitality, and tourism industries to design and develop robust data-driven solutions across different service lines and business units. Nithin holds a B.S. in Industrial & Systems Engineering from the University of Florida and is currently pursuing a M.S. in Analytics from the Georgia Institute of Technology. Check him out on LinkedIn!
Callen Smerker
Callen is a Solution Architect at 3Cloud who, since 2014, has had a variety of experience in database and business intelligence solutions. This includes leading large, complex projects as both Data Architect and Project Management Professional. He enjoys designing solutions within the Microsoft BI stack (SQL Server, SSIS, SSAS, and PowerBI) in combination with Azure DevOps CI/CD pipelines to deliver enterprise grade analytics solutions. Callen has experience working in a variety of industries including healthcare, life science, pharmaceutical, and logistics. Check him out on LinkedIn!