by October 8th, 2015on
In my last post, I introduced Microsoft’s Cortana Analytics Suite, a collection
of Azure-based PaaS and SaaS offerings bundled to provide
a comprehensive toolkit for building Advanced Analytics and IoT solutions. These are cloud-based resources to perform a variety of functions, from collecting telemetry from remote devices, to predicting diabetes-related hospital readmissions, to showing real-time race car metrics to a pit crew. In most use cases for Big Data or IoT (Internet of Things) solutions, data and event ingest is a critical step, and of course Cortana Analytics has several options to help streamline and support this operation. This post will provide a summary of those offerings and what functionality they provide.
For event ingestion, Azure Event Hubs is a managed event ingestor with broad-based compatibility and elastic scale, capable of logging millions of events per second. Event Hubs sit in the Azure Service Bus, basically at the same level as Topics and Queues, but they provide a different sort of functionality than typical message queuing.
An Azure Event Hub can capture events from Event Publishers (such as incoming device telemetry from wearable devices, continuous feedback from mobile apps, traffic metadata from large scale web farms, etc) using AMQP and HTTP as primary API interfaces. Downstream applications, as subscribers to the Event Hub, are called Consumer Groups and each have their own view of the event stream, and can process of their own accord — with real-time analytics tools (e.g. Azure Stream Analytics and Power BI) or by storing the data via adapters (e.g. to Azure Blob Storage — more on that later).
Essentially, Azure Event Hubs provide your solution architecture with a scalable and elastic means of decoupling event ingest from event consumption or processing.
The other means of conducting data workflows is to use Azure Data Factory. Data Factory is an integration service that allows orchestration and automation of data movement and transformation. I recently posted about ADF as part of Perficient’s “Azure: Did You Know?” series, so I won’t repeat myself here. But suffice it to say that Azure Data Factory ends up being somewhat like the circulatory system of the Cortana Analytics Suite, moving data between various storage types and service points.
Azure Data Factory could be used, for example, to build a process workflow that:
- takes data from an on-premises SQL Server database
- copies it to Azure Blog Storage
- then moves it into an HDInsight Hadoop cluster
- and trains an Azure ML Model with it
Azure Data Factory can also subscribe to an Azure Event Hub, allowing ADF to process incoming event data directly into a pipeline.
Using either of these services, once event/data ingest has occurred, you’re going to want to store that data. For this Azure provides a number of solid storage options depending on the needs: Azure Table Storage, Azure Blob Storage, DocumentDB, Azure SQL Database, Azure Data Lake, etc. Some of these are quite cheap, some are SQL-oriented, some NoSQL oriented. Additionally, Microsoft has a new Azure service that will help catalog and make discoverable registered data sources. We’ll talk about what happens AFTER ingest — Storage and Information Management — next time.