If you are like many data professionals, you may have years of experience building and supporting on-premises data warehouse implementations, but have become overwhelmed by the blisteringly fast rate of change in the industry over the past few years. Business Intelligence tools and platforms have never required more agility and adaptability than they do today, and never have the choices and technologies been so plentiful. With this constant change can come much anxiety and confusion, but also great innovation and opportunity.
You Got This!
My hope with this post is two-fold. Primarily, I’d like to set your mind at ease. The more things change, the more they stay the same. You may not know how to program and build an autonomous highway vehicle, but you are not as far away from BI relevance as you might fear. I also intend to introduce you to a modern Business Intelligence architecture stack that should feel familiar to you as you explore the vast array of Microsoft Cloud offerings. To that end, I would like to share with you one possible arrangement of Azure tools which might begin to help you build an understanding of possible roadmaps for moving your data warehouse into the cloud.
Let’s start with an overview of some key Azure services:
Azure SQL Database
Azure SQL Database (DB) can simply be your traditional SQL Server database, except it’s in the cloud. That’s the simplest way to think about it. Developers and administrators with an on-premises SQL Server background will find a low learning curve when moving to Azure SQL DB. Your data will have a comfy new home in the cloud and you will reap the performance benefits of managed services. Along with great flexibility and scalability, you will pick up many of the features previously only available by ponying up for an Enterprise license. With two deployment options, Azure SQL DB can be a simple cloud replacement of on-premises databases, but it can also extend and expand upon your traditional relational data through support of many other data types like JSON, XML, spatial data, and much more.
Azure SQL Data Warehouse
Azure SQL Data Warehouse is the Azure service for processing and storing massive volumes of data, and for consolidating disparate data into a single location using simple SQL constructs. It offers flexibility and cost savings by separating compute activities from storage, so you only pay for what you are using. Spin down your compute capacity when you don’t need it without losing access to your data.
SQL DW scales easily by distributing compute throughput which is a major component to this service’s performance advantage through Massively Parallel Processing (MPP). For the sake of keeping this article digestible, we’ll forego the technical details under the hood of MPP. Just know that it is the secret sauce that brings to Azure SQL DW the ability to ingest and analyze petabytes of data with unbelievable performance. Azure SQL DW is not the service for processing small datasets. If you are comfortable with relational data warehouse structures, think bigger, more diverse and performant, and you’ll begin to see the power of Azure SQL DW.
Azure Databricks
Azure Databricks is an interactive platform built to run and manage Apache Spark in the Azure cloud. Spark is a data processing engine designed to be open and versatile, and serve the needs of everyone from data engineers to data scientists. Use it for data movement activities, to collect and process live streams of data, even IoT sensor feeds. Additionally, Azure Databricks supports a variety of languages: R, Python, SQL, Scala.
Azure Databricks provides a friendly workspace to manage Spark clusters, and manipulate and visualize data. The marketing material really gravitates to the phrase “unified analytics platform” which, once you really dig in and start to understand it, becomes less marketing noise and more “Yeah, that actually describes it quite nicely.” There is not much to offer here in the way of drawing a parallel between Azure Databricks and something you are already familiar with in a traditional data warehouse environment. There is a steeper learning curve with Azure Databricks, but this service offers excellent opportunity for innovation.
Azure Data Factory V2
You can use Azure Data Factory (ADF) V2 as the cloud incarnation of SQL Server Integration Services. As is par for this post, this statement massively understates the capacity of ADF. It provides a hybrid “pipeline model” so you can group logically related units of work and seamlessly span your cloud and on-premises data sources while leveraging advanced compute services (think HDInsight, Hadoop, Spark, etc.) ADF provides 75+ connectors and can be used to lift and shift your existing SSIS packages from on-premises into your Azure environment. You can also automate and schedule workflows with data-driven activities. If you are familiar with visual methods of data integration, you’ll pick up on ADF like a fish to water.
Azure Analysis Services
Perhaps the lowest learning curve in this transfer is to Azure Analysis Services (AAS). This Platform as a Service (PaaS) is your analytics engine and all-important semantic layer for your data warehouse. You can still access your on-premises data through the gateway while taking advantage of the scalability benefits of managed services, or you can move your data assets to the cloud using the previously mentioned services to provide an end-to-end cloud data solution. AAS is fully compatible with all your current known and trusted tools (SSMS, SSDT, Power BI, etc.) so you’ll be comfortable working with it from day one.
Putting It All Together (batch marries stream)
Now that we’ve covered some basic components, lets pull them all together into a possible use case to envision our “Modern BI Architecture.” Let’s suppose you have IoT sensors capturing measurements from a manufacturing machine. This could be in the form of temperature, rotation speed, tension, amperage, etc. Those sensors could produce a staggering amount of data, but how do you measure the efficiency of that machine, or even the machine’s task at hand? You may need real-time on-the-ground operator feedback. You may need to turn that data into efficiency metrics monitored not only by line operators and foremen, but by managers and corporate decision-makers who all have a different stake in and opinion of the timing of inbound data. Further suppose that you have finished good quality data which is generated and stored separately by independent systems, but that these systems are subject to less frequent batch processing. Your data scientist may need to pull data from both your batch and your streaming data sources in order to make meaningful prescriptive and predictive recommendations and to train machine learning models. Landing your sensor data and your batch data in a semi-structured blob store, you can use Azure Databricks and Azure Data Factory to stage and eventually land in an Azure SQL Data Warehouse. Once there, it will be ready to be hooked by Azure Analysis Services for presentation.
Azure Reference Architecture Diagram
Maybe you have never dealt with any of these services or technologies. I made a claim earlier in this post: The more things change, the more they stay the same. When you boil the above scenario down to its simplest form you have what you should find a familiar formula. Regardless of the source or the target, you must extract data and load data. Somewhere in between your source and target you may need to do some transformations – extract, transform, load (ETL). If you are familiar with data warehouse modeling, SQL and the concepts of ETL or ELT, you should be right at home with modern BI architecture.
If you’re looking to advance your organization’s Business Intelligence skills, contact us today. We offer hands-on, instructor-led training in the latest analytics technologies, and design custom solutions that transform your enterprise through data.