Microsoft recently announced the general availability of Azure Databricks. This service is the fruit of a lot of hard work between Microsoft and Databricks to make the already-popular Databricks Unified Analytics Platform a first-class service on Microsoft Azure. This platform brings the power of Apache Spark into a secure, scalable workspace environment where data scientists, data engineers, and business users can explore and analyze data, build and schedule ETL pipelines, and build and deploy complex machine learning models.
Azure Databricks has the potential to be a game changer in the Microsoft Data and AI Platform. Few, if any, Azure data services cater to the variety of users, connect to the wide range of data sources, and satisfy the broad set of compute tasks necessary for ETL and AI use cases that Azure Databricks does.
Single Service for a Variety of Users
First, Azure Databricks provides a single workspace for a variety of users. It provides a place for data engineers to transform data and create and schedule batch and streaming ETL jobs. Data scientists can also use Azure Databricks to explore data, create machine learning models, and perform other advanced analytics tasks. Even business analysts can write SQL queries and analyze and visualize data in Databricks notebooks. They can also use tools like Power BI or Tableau to connect to Azure Databricks tables for analysis and self-service BI.
Universal Connectivity to Azure Storage Services
Second, Azure Databricks seamlessly connects to all the different Azure storage options. This includes the ability to read and write to file-based storage, like Blob storage and Azure Data Lake Store, as well as relational data stores, like Azure SQL Database/Data Warehouse, and NoSQL data stores, like Azure Cosmos DB. It also connects to streaming or event data sources in Azure, such as Event Hubs or Apache Kafka on HDInsight.
Single Service for a Variety of Compute Tasks
Finally, Azure Databricks provides a single service for many of the common compute tasks in ETL and AI use cases. The Microsoft Data and AI Platform already provides a collection of unique services that work together to satisfy specific data processing use cases. This may mean that a client might use Azure Data Lake Analytics (U-SQL) for batch processing of big data in a data lake, Stream Analytics for processing steaming event data, and Azure Machine Learning for creating and operationalizing predictive models. With Azure Databricks, all these different use cases and compute tasks can be developed and implemented in a single workspace. Batch ETL jobs can be developed in the workspace and then scheduled via the Databricks scheduler or with Azure Data Factory; jobs for processing streaming data can be developed and deployed to dedicated clusters within the workspace; and machine learning models can be created and deployed in the workspace to be used in batch or streaming jobs to make predictions on new data.
BlueGranite has recognized Azure Databricks as a critical part to the Microsoft Data and AI Platform. To position ourselves to be the best at helping clients implement this new service, BlueGranite is teaming up with Azure to become an official Databricks consulting partner. This partnership, paired with our existing relationship with Microsoft, helps provide BlueGranite the tools, education, and technical and sales support resources from both vendors to help clients do more with their data using Azure Databricks.
If you are interested in learning more about Azure Databricks and how you can get started, check out BlueGranite’s 1-day workshop and consulting offer!