Organizations are amassing more data than ever, yet it is getting more difficult for their employees to find that data and use it with confidence. What if there was a solution out there that not only told us what data sources we have, but could tell us how those data sources should be used, and who the stewards/producers of that data are? What if it could allow us to classify our data, and provided us insights into what our entire data estate looked like? It might sound like data nirvana, but it just might be possible with the newest Platform as a Service (PaaS) offering from Microsoft, Azure Purview.
This series of articles will take a deep dive into each of these building blocks that make up Azure Purview. This initial article is meant to give an overview of what Purview is and what it can do for your data estate.
Purview is a data governance cloud offering that isn’t just one thing, it’s a platform that includes several building blocks. There is the Data Map, which is the foundation for data discovery. Think of this as the engine and storage unit that makes everything possible. There is the Scanning and Classification engine, which allows you to connect to your data sources, scan them, then classify them. Finally, there is the Data Catalog, which stores the meta data about your data sources in a searchable format for end users. Depending on which level of Catalog you choose, you can also get a business glossary, lineage visualization, catalog insights, and sensitive data identification insights.
It is based on the Apache Atlas Open API ecosystem, with some enhancements and additions by Microsoft. If you are interested in the details of the relationship between Purview and the Apache Atlas Open API, there’s an excellent article about this from the folks at Microsoft.
Being a PaaS offering means that Purview is only available in certain regions, as of today those regions include Canada Central, East US, East US 2, South Central US, UK South, Australia East, Central India, and coming soon (Q3 2021) to West Central US and West US.
Azure Purview is currently in Public Preview, which means everything is subject to change until the product reaches general availability. However, if history tells us anything, once Microsoft releases a product for Public Preview, significant changes are unlikely.
Note: Azure Purview is now Generally Available as of 9/28/21.
Let’s Talk Pricing
Pricing for Purview is significantly more than what users of Azure Data Catalog v1 are used to. However, you are getting much more than just a data catalog with Purview, so it’s a bit like comparing apples and oranges. Because pricing is always subject to change, we won’t talk specifics, but instead will refer you to the Azure Purview Pricing page.
Purview allows you to have multiple accounts per subscription, per tenant, with a current limit of three per tenant (all subscriptions combined). This way you can separate your data assets based on business functionality or even environment if you so choose. This is a huge change from Azure Data Catalog v1, where you could only have one Data Catalog per tenant. While you can have multiple accounts per subscription and per tenant, you cannot scan another tenant’s data source. You will need to create separate accounts for each tenant.
The set up of Purview is a significant effort and requires some administrative privileges within your Azure tenant. Microsoft has done a great job of creating a starter kit and tutorial to get folks started. But be sure to reference pricing, as soon as you create your Data Map and start Scanning, you will start to incur costs. Unlike other Azure PaaS offerings, there is no ability to pause Purview, you will always incur the cost of the Data Map.
That’s all for our overview of Purview. Stay tuned for our next article in the series, where we dive deeper into Data Catalog portion of Purview.
Please visit our website to learn more, or contact us directly to see how we can help you explore your about modern data analytics options and accelerate your business value.