Welcome to the final installment of our Azure Purview Series, focusing on Purview Data Map. The Purview Data Map is the foundation for data discovery and serves as the jumping–off point for all things data source related. It is the cloud–native platform-as-a-service (PaaS) that collects the metadata about your data sources and runs the scans to keep your metadata up to date.
You can use the Purview Portal to visually interact with the Data Map or you can use the Apache Atlas Open APIs to programmatically interact with the Data Map. The following narrative is based on Purview Portal interaction.
In the Data Map, you can see all your data sources in one place, edit your data sources, schedule scans of your data sources, and organize your data sources visually into collections, as in the example below, presented in “Map view”:
If you prefer a list view of your data sources, you can switch to “Table view” using the slider, and your Data Map will look like this:
The advantage of the table view is that you can see additional information that you can’t see on the map view, like the number of scans per data source and the date the data source was registered.
Data Source Scans
You can schedule scans of your data sources via either the map or table view. You can create scans that run once, or on a recurring basis to ensure the metadata about your data sources is always up to date.
In the map view, the process to start a scan is pretty obvious:
you simply click on the “New scan” icon.
However, in the table view, you need to hover over the row for the data source to get the “New scan” icon to appear.
Data Source Collections
You can also create new Collections in the Data Map. Think of Collections as imaginary borders that allow you to visually separate your data sources. You can create your Collections based on business area, environment, source type, region, or whatever criteria you choose. You can also create hierarchies within your Collections to make organizing your data sources even easier. You can assign data sources to a Collection at the time the data source is registered or anytime after the data source has been registered. You can do this using the visual interface of the Purview Portal or programmatically via the Apache Atlas Open APIs.
The final point about Data Map is cost. This is where most of your costs will come from when using Purview. Unlike some other Azure services, the Purview Data Map cannot be paused to help control costs when not in use. All Purview accounts are created with a default Data Map size of 1 capacity unit (CU), where 1 CU supports up to 25 data map operations per second and includes up to 2GB of storage for your meta data. The data map is elastic, which means it will automatically scale based on the load request, up to a maximum of 100 CUs. By default, the scaling is configured to not scale more than 10 times the steady state capacity in order to control costs, for more detailed information, see the Microsoft Elastic Data Map article. During the Public Preview all CUs are free, however once Purview goes into General Availability (GA), you may be in for a surprise. See the current Azure Purview Pricing page for more information.
Note: Azure Purview is now Generally Available as of 9/28/21.
Well, that’s it for our Azure Purview series. If you missed the first three posts in this series, they can be found here:
Azure Purview Series – Part 1: An Overview
Azure Purview Series – Part 2: Data Catalog
Azure Purview Series – Part 3: Scanning & Classification