The Data Catalog portion of Purview is where most people will spend their time. It provides the information about your organizations data assets in a searchable format. Depending on which level of Data Catalog you choose; you can also access a business glossary, lineage visualization, catalog insights, and sensitive data identification insights. This article will focus on the three different levels available within Data Catalog and offers scenarios demonstrating when you would use each offering.
The three levels of Data Catalog features that are available are: Free, C1, and D0. Each level comes with a different set of features that builds on the previous feature level. For example, C1 includes everything from the Free level, and D0 includes everything from C1 level.
|Free||Search and browse data assets||Included with Data Map|
|C1||Business glossary, lineage visualization, and catalog insights||Free in Preview|
|D0||Sensitive data identification insights||Free in Preview|
The Free level of Data Catalog comes with search and browse capabilities within your data asset library. This enables you to search for any of the data assets that have been appropriately scanned and registered. The search functionality within Purview is extremely quick and comprehensive:
- Your data assets are indexed not only by name, but by all the metadata associated with them, all the way down to the column level (if available).
- You can use keywords to search, filter search results, and see previous search keywords. The search also provides live suggestions based on what you are typing (think autocomplete).
If you need more features than just search capabilities, then you can move up a level to C1. With C1, you gain access to a business glossary, data lineage visualization, and catalog insights.
The Business Glossary provides vocabulary definitions for business users. This can come in handy when you use terms differently within your organization. The glossary allows you to define terms and then map them to data assets. This way, everyone has the same understanding of the terms that are being used, and confusion is minimized. Purview supports the following out-of-the-box attributes for any glossary term:
- Data Stewards
- Data Experts
- Related Terms
You can also add your own custom attributes to complement and enrich your glossary terms.
Data Lineage Visualization
Data lineage is import for so many reasons, including impact analysis, troubleshooting incorrect report results, and root cause analysis. But sometimes it’s hard to see how all the pieces fit together without an end-to-end picture. This is where lineage visualization comes in. Purview takes all those parts and pieces and creates a visual to allow users to better, and more quickly, understand the data lineage from raw data staged in different platforms, to the transformations performed in your ETL/ELT tools, to data visualizations in your reports. Purview is able to do this by capturing all the metadata about your data sources and transformation tools at the highest available degree of granularity.
We all know a picture is worth a thousand words, so Microsoft included some great visualizations to help enhance the Data Catalog to provide a “single pane of glass” view of your data estate. Purview does this through several different Catalog Insights, including:
Asset Insights give you better understanding of what types of assets you have and how they are distributed across your data estate. You can see how many you have, their distribution by source type, size, and classification.
- Scan Insights provide administrators with the tools to understand the overall health of the scans that are performed.
- Glossary Insights provide business users valuable information about what areas are being assigned terms and what areas may need more attention.
- Classification Insights show you where your classified data lives, enabling security administrators to do their jobs more effectively and efficiently.
- File Extension Insights show you how many different file extensions are found during scans. This can be extremely helpful in identifying “dark data sources” that are not under IT control but are mission critical.
Data Sensitivity is a key concern when talking about data governance. Microsoft already has Sensitivity Labels in Microsoft 365, and those same sensitivity labels can be easily extended into Purview when Information Protection for Azure Purview is turned on in the Microsoft 365 compliance center. Once enabled, you can define what is sensitive in your organization and apply those labels to your data assets.
All these features can be accessed via the Azure Purview Studio in any of the leading web browsers:
- Microsoft Edge
- Safari (latest version, Mac only)
- Chrome (latest version)
- Firefox (latest version)
No additional applications need to be installed or maintained.
Note: Azure Purview is now Generally Available as of 9/28/21.
That’s it for the Data Catalog installment of our Azure Purview series! If you missed the other posts in this series, they can be found here:
Azure Purview Series – Part 1: An Overview
Azure Purview Series – Part 3: Scanning & Classification
Azure Purview Series – Part 4: Data Map