Whether you call it streaming analytics, complex event processing or the Internet of Things, capturing and analyzing large volumes of data in real-time has always been a challenge. Streaming analysis solutions are required to “drink from a fire hose” of data without losing a drop. They must capture every message, ensure that no message is duplicated and ensure that they are processed in the right order. This type of solution can be extremely complex and can significantly tax computer and network resources. Specialized infrastructure may be required that is expensive and managing scale and fault tolerance is difficult.
Microsoft’s Azure cloud platform provides many features and services that make building complex applications easy, affordable and scalable. Azure is like a collection of building blocks that allow architects to assemble complex solutions quickly, experiment with them and tear them down and rearrange them in new ways.
Many organizations use Google Analytics to understand how their web properties are being used. Google Analytics offers reports and API’s to allow subscribers to perform a wide variety of analysis of their web traffic. I built a solution that demonstrates how to use the components of Azure to take data from the Google Analytics API and analyze what pages are being viewed on BlueGranite’s video blog site in real-time.
The specific components of Azure used in this solution include:
- Azure Worker Role – a worker role is a piece of published code that runs in an Azure Virtual Machine. Worker Roles are specifically designed to run background processing tasks and don’t have a web interface. The virtual machine instances for a worker role are managed automatically by Azure and worker roles can be scaled out to meet any processing needs. In this example, the worker role reads data from the Google Analytics Real Time Reporting API every few seconds and sends it to an Azure Service Bus Event Hub.
- Azure Service Bus Event Hub – Service Bus is a generic, cloud-based messaging system for connecting applications, services and devices. Event hub is a managed service within Service Bus that provides a foundation for large-scale ingestion across a broad variety of scenarios. In this solution, Google Analytics data is serialized to JSON in the worker role and sent to Event Hub as individual messages.
- Azure Stream Analytics – Stream Analytics is an event processing engine that provides a SQL interface that allows real-time queries over millions of streaming events per second. I used Stream Analytics to analyze and aggregate the data coming from Google Analytics and send the result to an Azure SQL Database.
- Azure SQL Database – Azure offers SQL Server database capabilities in a Platform as a Service (PaaS) model. In this solution, the output data from Stream Analytics is stored in an Azure SQL Database to be consumed later by a user in Excel or other business intelligence tool.
In this new video, I show you how this solution was built from beginning to end and how the data can be used for analysis in Microsoft Excel 2013.
Need help developing and executing a strategy for real-time analytics in your organization? Let us know and we’ll setup a call with you and one of our specialists.