Some years ago, in the heyday of on-premises computing, the Microsoft SQL stack was a one-stop shop for BI solutions. It offered everything needed for a data solution. SQL Server hosted a data warehouse. SSIS transferred and transformed the data from source systems. SSAS served analytic queries. SSRS and Excel teamed up to deliver the data to end users. It provided an all-encompassing toolset in one package. Life was happy, idyllic, and stable.
Then the cloud happened.
Seemingly overnight, Amazon, Google, and Microsoft were all vying to migrate customers from on-prem products to cloud solutions. The computing giants were busy inventing and reinventing products to be cloud native. Computing suddenly became “as a service,” from infrastructure to platforms to software. Microsoft introduced their “-aaS” acronyms: IaaS, PaaS, SaaS. Then they launched the Azure cloud, providing early services like virtual machines, Azure SQL Databases and Azure Data Factory.
Coinciding with the rise of the cloud, clustered computing and massive storage engines took off. Spearheaded by open-source technologies like Hadoop, they used the combined power of many networked computers to tackle previously impossible workloads. However, these clusters require significant administration to maintain the computers and their networks. With its focus on providing computing infrastructure as a service, the nascent cloud was the perfect fit. Cloud platforms made these cluster technologies available for a wide audience, not just large enterprises.
These two revolutions turned the Microsoft BI world upside down. It went from a stable BI and analytics stack to something resembling anarchy. A fixed set of tools gave way to options at every turn. with on-prem, when you wanted a database, you only needed to choose which edition of SQL Server you wanted. In Azure, you could choose that same SQL Server on a VM or an Azure SQL Database. You could run a SQL Managed Instance. Maybe, instead, you wanted to check out other database engines hosted on Azure, like MySQL or PostgreSQL. If you wanted to try a clustered engine, you could check out HDInsight, Databricks, or Azure SQL Data Warehouse. Everywhere you looked, you saw an explosion of options for storage, ETL, streaming, machine learning and reporting. Even experts like us were having our heads spinning when architecting a solution.
Then Microsoft introduced Azure Synapse Analytics.
It started off humbly enough. It began by rebranding Azure SQL Data Warehouse and bundling that with Data Factory’s pipelines. At first glance, it didn’t seem to offer anything new. I initially dismissed it as an irrelevant marketing exercise. But it did offer something new. It was what I had been longing for since the Azure platform was introduced.
What did it offer? It offered cohesion. It offered stability. It offered a vision. In its inception, it wasn’t much, but it was a start. In Azure, I wanted one place where I could architect and implement an analytics solution. I wanted all the necessary technologies to snap together seamlessly with little to no effort on my part. Synapse aims to be that workspace.
It has since made major strides to become that one-stop shop. Microsoft has slowly been integrating new technologies under the Synapse banner, making it a very appealing platform for analytics moving forward. It began with SQL pools and data pipelines but has added spark clusters and Power BI integration. Data explorer and other major new features are being introduced as well. On the operational side, Azure Key Vault and Azure DevOps/Github integration help cement Azure Synapse as the primary analytics platform in Azure. With those new tools in place, it is now entirely possible to engineer and administer a modern data and analytics solution all in one place.
SQL Pools, formerly Azure SQL Data Warehouse, is a massively parallel processing engine. Essentially, it operates as a SQL database over Azure Storage-backed data. SQL Pools split the difference between big data platforms and legacy T-SQL familiarity.
The Synapse pipelines offer the integration and orchestration platform. It offers robust support for the cloud ELT pattern, extracting data from source systems, transforming with data flows and storing in a cloud analytics architecture. Its scheduling and triggers provide the orchestration element of a Synapse workspace.
Spark pools give Synapse solutions much more flexibility and power. Spark brings a rich set of streaming, machine learning and data lake-oriented features. With a variety of programming languages and their library ecosystems, Spark allows you to handle many new types of data and implement complex logic. Its inclusion really brought Synapse into the big time as a modern data platform solution.
Power BI integration was the next big step in Synapse’s evolution. The integration allows the breadth of Power BI assets to be managed from the Synapse workspace. Power BI datasets and reports can be developed and published directly from the Synapse environment. Linking a Power BI workspace into Azure Synapse workspace lifted Synapse from a modern data platform tool to a comprehensive data and analytics platform.
Key Vault is Azure’s cloud secret management store. Whether you’re using pipelines or spark to interface with outside systems, either can leverage Key Vault to securely store secrets or certificates with minimal configuration. Whether it’s usernames, passwords, connection strings or environment variables, Key Vault improves security and abstracts solution code from their environments.
Azure DevOps/Github integration is the other major operational key. It provides two important features for real-world solutions: source control and CI/CD automation. All pipelines, all notebooks and all saved queries can be backed by git-style source control, with robust support for branching baked right into the Synapse workspace. When you integrate Synapse with the CI/CD processes of DevOps or Github, changes to your analytics solution can be deployed with the press of a button.
Coming up soon, Synapse Link will provide real time replication of source system data into your Synapse environment. The public preview supports Azure SQL Database, SQL Server 2022, Cosmos DB, and Microsoft Dataverse. Other sources are sure to follow. With real-time replication, your analytic store can be always up to date, enabling up-to-the-minute insights.
With all these features in mind, we come to a clearer understanding of Microsoft’s vision with Synapse. Ingestion can be handled and orchestrated by the pipelines. Transformation, no matter the format, can be handled with some combination of SQL pools, Spark pools and data flows. Those tools can walk the data from their raw state to operational data stores to dimensions and facts. Power BI can pick up from there and turn that data into reports and dashboards. Spark can receive streaming data and send it to Power BI. Finally, with KeyVault and DevOps integration, it is poised to work in a production environment. In short, it can provide an entire enterprise analytics solution. In the long run, it aims to be nothing less than THE analytics toolset for the cloud.
There are still rough edges. But it has the vision, and as long as it executes on that vision, it’s the place I want to be. If you want help with your vision of a data and analytics future, call on 3Cloud to be your vantage point for the horizons ahead.