Book Image

Stream Analytics with Microsoft Azure

By : Ryan Murphy, Manpreet Singh
Book Image

Stream Analytics with Microsoft Azure

By: Ryan Murphy, Manpreet Singh

Overview of this book

Microsoft Azure is a very popular cloud computing service used by many organizations around the world. Its latest analytics offering, Stream Analytics, allows you to process and get actionable insights from different kinds of data in real-time. This book is your guide to understanding the basics of how Azure Stream Analytics works, and building your own analytics solution using its capabilities. You will start with understanding what Stream Analytics is, and why it is a popular choice for getting real-time insights from data. Then, you will be introduced to Azure Stream Analytics, and see how you can use the tools and functions in Azure to develop your own Streaming Analytics. Over the course of the book, you will be given comparative analytic guidance on using Azure Streaming with other Microsoft Data Platform resources such as Big Data Lambda Architecture integration for real time data analysis and differences of scenarios for architecture designing with Azure HDInsight Hadoop clusters with Storm or Stream Analytics. The book also shows you how you can manage, monitor, and scale your solution for optimal performance. By the end of this book, you will be well-versed in using Azure Stream Analytics to develop an efficient analytics solution that can work with any type of data.
Table of Contents (18 chapters)
Title Page
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Understanding queues, Pub/Sub, and events


In this section, we will review two key concepts—queues and Publish/Subscribe models, followed by event-based messaging models.

Queues

A queue implements a one-way communication, where the sender places a message on the queue and a receiver will collect the message asynchronously. Features such as dead letter queues, paired namespaces, active/passive replication, and auto-forwarding to a chain queue that's part of the same name provide the rich feature set for message flowing between an application and providing a highly available solution.

A queue consists of three key elements:

  • Sender: Sends the message to the receiver through a durable entity.
  • Durable entity: Stores the received durable message and offers persistence. The messages are stored until they are collected by the receiver.
  • Receiver: The final recipient of the message.

The key advantages of a queue are as follows:

  • Queues operate on the principle of first in, first out (FIFO): For example, consider a simple queue where, at one end, you put messages, and on the other end you will receive them in the same respective order. For example, service bus queue implements the FIFO pattern.
  • Point-to-point: The fundamental concept of Queues is, they are point-to-point messaging; even though there may be multiple senders of messages, there is only one receiver of the messages. 
  • Asynchronous communication: This implies that endpoint addresses are connected directly. A static structure may exist where senders and receivers communicate through named channels. Asynchronous communication helps with building decoupled architecture and allows higher resilience to add and process messages when either the publisher or consumer of messages has downtime.
  • Security: Due to the mutual knowledge of senders and receivers from the security point of view, senders know where the data will land, and it's easier to enforce security policies.

The following figure illustrates the preceding concept:

Publish and Subscribe model

Publish/Subscribe is a communication paradigm for a large-scale system. It enables loose coupling between mutually anonymous components and supports many-to-many communication.

The core concept of the Publish/Subscribe model is very simple. A Publisher publishes information on some topic, and anyone that is interested in the information will be able to find that information at the same time, simply by subscribing to that information. Well known example of this pattern is News Feeds and end user that are interested can subscribe to the type of news feeds they like to listen. Let's review  key components in the Publish/Subscribe paradigm:

  • Publisher (message sender):
    • Middleware connects the Publish/Subscribe middleware to communicate
    • Publishers produce events without any dependence on subscribers
    • Publishers advertise the events they are prepared to publish
    • The publisher announces an event without having any understanding of the potential subscriber
  • Topic:
    • The topic is conceptually similar to the queue, but the topic can have a copy of a given message that is forwarded to multiple subscriptions
    • Topics and subscriptions provide a one-to-many form of communication-based on the Publish/Subscribe pattern
  • Subscriber (receiver):
    • Subscribers register their interest in receiving events through a subscription that the middleware handles
    • The subscriber can subscribe and unsubscribe to events
    • The subscriber has to express interest in one or more events and only receive events related to their interest, without any knowledge of which publishers can provide that given event
  • Subscription:
    • Provided once the event is received and a subscriber consumes it, the same event cannot be replaced, and new subscribers will not see the event to eliminate duplicate processing of events
    • Subscription is similar to a virtual queue that receives copies of the message that were sent to the topic; you can optionally include filter rules for a topic on a per-subscription basis, which allows you to filter messages as illustrated:

Key benefits of the Pub/Sub model are as follows:

  • Decoupling (loose coupling): Space, time, synchronization decoupling:
    • Space: The publisher and subscriber don't need to know each other either by name or IP address, for instance.
    • Time: The publisher and subscriber don't need to run at the same time.
    • Synchronization: Operations can continue at both ends of the spectrum (publish and receiving).
  • Highly parallel:
    • The model is highly parallel in that subscribers can process events and, at the same time, the publisher can keep publishing events.
  • Scalability:
    • Due to the decoupling nature and parallel nature of the model, the Pub/Sub model is highly scalable.
    • To achieve higher velocity, events can cache and smarter routing to the subscriber can be configured to scale.
    • The key challenge with the Pub/Sub model is scaling to millions of publishers and subscribers.

In addition to the preceding, there are two key challenges with the Pub/Sub model:

  • No guarantee of message delivery because of the decoupled nature of the model
  • For applications that totally depend on guaranteed message delivery, the queue-based model will not fit

Real-world implementations of the Publish/Subscribe model

TIBCO is one of the pioneers that preached the Publish/Subscriber model during the time of centralized batch processing. The TIBCO approach changed the paradigm on the stock trading floor.

RSS Feeds use the Pub/Sub model; you subscribe to an RSS feed, to one or more forums on a discussions platform, or follow someone on Twitter--in each case, there is one publisher and multiple subscribers involved.

Companies like IBM build protocols like Message Queue Telemetry Transport (MQTT). Some example of products that are built on the Pub/Sub model:

  • IBM MQ is one of the early deployers of the Pub/Sub model
  • Websphere
  • Wormhole Pub/Sub system from Facebook
  • Google Cloud Pub/Sub

In the next section, we will look how Azure implements queues and Pub/Sub models.

Azure implementation of queues and Publish/Subscribe models

Azure supports two types of queuing mechanism:

  • Storage queues
  • Service bus queues

Storage queues are part of the Azure storage infrastructure and offer a simple REST-based GET/PUT/PEEK interface. This does provide a reliable persistent messaging within and between the services.

Service bus queues are part of the enterprise offering of the Azure messaging infrastructure. As a part of the offering, it includes queues, Pub/Sub, and advanced integration patterns.

Both of these queuing technologies exist in parallel, to cater different type of use cases. Storage queues were first offered as a service to exist on top of the Azure storage service. Service bus queues came after storage queues and support wider use cases and scenarios. For example, if your components span multiple communication protocols, data contracts, trust domains, and network environments Azure Service Bus queues is the ideal solution.

There are a couple of key technical differences between Azure Storage queues and Azure Service Bus queues.

You should consider Azure Storage queues when:

  1. Your queue size requires over 80 GB and messages will have a lifetime of shorter than seven days.
  2. You are building on Azure worker roles and you want to preserve messages between worker roles crashes.
  3. Server-side logs are required for all transactions executed against your queues.

You should consider Azure Service Bus queues:

  1. When you want to be guaranteed FIFO ordered delivery.
  2. When your solution requires duplicate detection.
  3. When your (Time to live (TTL) ) can exceed more than seven days and your message sizes are greater than 80 GB.

If you would like to understand the detailed differences between Azure Storage queues versus Azure Service Bus queues, visit https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted.

Azure Service Bus messaging is an implementation message queuing concept implemented in the Microsoft Azure as a Platform as a Service (PaaS) offering. All Azure PaaS services are built with high resiliency and high availability.

In this section, we briefly reviewed  Azure implements queues and Pub/Sub models. Since the goal of this book is Stream event processing, we will explore deeper into Azure Events in the next section.