Do your analysts typically know where to locate the data they need? When they locate a data source, do your analysts definitively know how the attributes and metrics are defined? Based on our experience working with customers, the answer to both questions is often no. Domain knowledge is typically gained over time through trial, error, and experience. And, it can be all too easy to lose this tribal knowledge when an experienced staff person leaves the company.
One of the best things we can do for our BI and Analytics programs is to introduce a data catalog. A data catalog can be of assistance with self-service BI, operational BI, corporate BI, analytics, and data science efforts in the following ways:
- Enable data discovery. By registering a data source in the data catalog, analysts don’t have to waste time asking around or searching for what data is available and where. Visibility of particular sources can be constrained to a particular user base to enforce security requirements.
- Single catalog for all enterprise data sources. Most organizations have multiple data platforms which have evolved over time to handle structured and unstructured data, on-premises and cloud data. By registering data into one catalog which exposes data from a variety of locations, we can save analysts significant time and effort hunting for data.
- Reduce duplication of effort. When existing data sources are registered and discoverable, this reduces the risk that an analyst will recreate data assets which already exist. Standardization not only saves time, but also reduces the risk for error and the need for reconciliations.
- Repository for common definitions, tips, and advice. Metadata is how we refer to “data about data” and can be exposed via a metadata repository. When terms are ambiguous (such as defining customer churn), documenting and annotating data definitions can significantly assist analysts with understanding the data and creating accurate reports.
We often like to say that BI initiatives are dependent not only upon technology, but on people and processes as well. Introducing a data catalog is highly dependent on all three components: Technology > People > Process. In this short video, we focus on the Technology portion with a brief tour of the Azure Data Catalog.
If you liked this article, check out our post on Azure Data Factory: Connecting Data to Insights by Mike Cornell, or contact us if you have any questions about Azure Data Catalog and how it fits into your overall BI and Analytics program.