Book Image

Building Data Streaming Applications with Apache Kafka

By : Chanchal Singh, Manish Kumar
Book Image

Building Data Streaming Applications with Apache Kafka

By: Chanchal Singh, Manish Kumar

Overview of this book

Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur. This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications, and tackles some common challenges such as how to use Kafka efficiently and handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security. By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka, and to design efficient streaming data applications with it.
Table of Contents (14 chapters)

Peeking into a point-to-point messaging system

This section focuses on the point-to-point (PTP) messaging model. In a PTP messaging model, message producers are called senders and consumers are called receivers. They exchange messages by means of a destination called a queue. Senders produce messages to a queue and receivers consume messages from this queue. What distinguishes point-to-point messaging is that a message can be consumed by only one consumer.

Point-to-point messaging is generally used when a single message will be received by only one message consumer. There may be multiple consumers listening on the queue for the same message but only one of the consumers will receive it. Note that there can be multiple producers as well. They will be sending messages to the queue but it will be received by only one receiver.

A PTP model is based on the concept of sending a message to a named destination. This named destination is the message queue's endpoint that is listening to incoming messages over a port.

Typically, in the PTP model, a receiver requests a message that a sender sends to the queue, rather than subscribing to a channel and receiving all messages sent on a particular queue.

You can think of queues supporting PTP messaging models as FIFO queues. In such queues, messages are sorted in the order in which they were received, and as they are consumed, they are removed from the head of the queue. Queues such as Kafka maintain message offsets. Instead of deleting the messages, they increment the offsets for the receiver. Offset-based models provide better support for replaying messages.

The following figure shows an example model of PTP. Suppose there are two senders, S1 and S2, who send a message to a queue, Q1. On the other side, there are two receivers, R1 and R2, who receive a message from Q1. In this case, R1 will consume the message from S2 and R2 will consume the message from S1:

A graphical representation of how a point-to-point messaging model works

You can deduce the following important points about a PTP messaging system from the preceding figure:

  • More than one sender can produce and send messages to a queue. Senders can share a connection or use different connections, but they can all access the same queue.
  • More than one receiver can consume messages from a queue, but each message can be consumed by only one receiver. Thus, Message 1, Message 2, and Message 3 are consumed by different receivers. (This is a message queue extension.)
  • Receivers can share a connection or use different connections, but they can all access the same queue. (This is a message queue extension.)
  • Senders and receivers have no timing dependencies; the receiver can consume a message whether or not it was running when the sender produced and sent the message.
  • Messages are placed in a queue in the order they are produced, but the order in which they are consumed depends on factors such as message expiration date, message priority, whether a selector is used in consuming messages, and the relative message processing rate of the consumers.
  • Senders and receivers can be added and deleted dynamically at runtime, thus allowing the messaging system to expand or contract as needed.

The PTP messaging model can be further categorized into two types:

  • Fire-and-forget model
  • Request/reply model

In fire-and-forget processing, the producer sends a message to a centralized queue and does not wait for any acknowledgment immediately. It can be used in a scenario where you want to trigger an action or send a signal to the receiver to trigger some action that does not require a response. For example, you may want to use this method to send a message to a logging system, to alert a system to generate a report, or trigger an action to some other system. The following figure represents a fire-and-forget PTP messaging model:

Fire-and-forget message model

With an asynchronous request/reply PTP model, the message sender sends a message on one queue and then does a blocking wait on a reply queue waiting for the response from the receiver. The request/reply model provides for a high degree of decoupling between the sender and receiver, allowing the message producer and consumer components to be heterogeneous languages or platforms. The following figure represents a request/reply PTP messaging model:

Request/reply message model

Before concluding this section, it is important for you to understand where you can use the PTP model of messaging. It is used when you want one receiver to process any given message once and only once. This is perhaps the most critical difference: only one consumer will process a given message.

Another use case for point-to-point messaging is when you need synchronous communication between components that are written in different technology platforms or programming languages. For example, you may have an application written in a language, say PHP, which may want to communicate with a Twitter application written in Java to process tweets for analysis. In this scenario, a point-to-point messaging system helps provide interoperability between these cross-platform applications.