Building Data Streaming Applications with Apache Kafka

By : Chanchal Singh, Manish Kumar

Building Data Streaming Applications with Apache Kafka

By: Chanchal Singh, Manish Kumar

Overview of this book

Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur. This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications, and tackles some common challenges such as how to use Kafka efficiently and handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security. By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka, and to design efficient streaming data applications with it.

Preface

What this book covers

What you need for this book

Free Chapter

Introduction to Messaging Systems

Understanding the principles of messaging systems

Understanding messaging systems

Peeking into a point-to-point messaging system

Publish-subscribe messaging system

Advance Queuing Messaging Protocol

Using messaging systems in big data streaming applications

Summary

Introducing Kafka the Distributed Messaging Platform

Replication and replicated logs

Message producers

Message consumers

Role of Zookeeper

Summary

Deep Dive into Kafka Producers

Kafka producer internals

Kafka Producer APIs

Java Kafka producer example

Common messaging publishing patterns

Best practices

Summary

Deep Dive into Kafka Consumers

Kafka consumer internals

Kafka consumer APIs

Java Kafka consumer

Scala Kafka consumer

Common message consuming patterns

Best practices

Summary

Building Spark Streaming Applications with Kafka

Introduction to Spark

Spark Streaming

Use case log processing - fraud IP detection

Producer

Summary

Building Storm Applications with Kafka

Introduction to Apache Storm

Introduction to Apache Heron

Integrating Apache Kafka with Apache Storm - Java

Integrating Apache Kafka with Apache Storm - Scala

Use case – log processing in Storm, Kafka, Hive

Summary

Using Kafka with Confluent Platform

Introduction to Confluent Platform

Deep driving into Confluent architecture

Understanding Kafka Connect and Kafka Stream

Playing with Avro using Schema Registry

Moving Kafka data to HDFS

Summary

Building ETL Pipelines Using Kafka

Considerations for using Kafka in ETL pipelines

Introducing Kafka Connect

Deep dive into Kafka Connect

Introductory examples of using Kafka Connect

Kafka Connect common use cases

Summary

Building Streaming Applications Using Kafka Streams

Introduction to Kafka Streams

Kafka Stream architecture

Integrated framework advantages

Understanding tables and Streams together

Use case example of Kafka Streams

Summary

Kafka Cluster Deployment

Kafka cluster internals

Capacity planning

Single cluster deployment

Multicluster deployment

Decommissioning brokers

Data migration

Summary

Using Kafka in Big Data Applications

Managing high volumes in Kafka

Kafka message delivery semantics

Big data and Kafka common usage patterns

Kafka and data governance

Alerting and monitoring

Useful Kafka matrices

Summary

Securing Kafka

An overview of securing Kafka

Wire encryption using SSL

Kerberos SASL for authentication

Understanding ACL and authorization

Understanding Zookeeper authentication

Apache Ranger for authorization

Best practices

Summary

Streaming Application Design Considerations

Latency and throughput

Data and state persistence

Data sources

External data lookups

Message processing semantics

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Peeking into a point-to-point messaging system

This section focuses on the point-to-point (PTP) messaging model. In a PTP messaging model, message producers are called senders and consumers are called receivers. They exchange messages by means of a destination called a queue. Senders produce messages to a queue and receivers consume messages from this queue. What distinguishes point-to-point messaging is that a message can be consumed by only one consumer.

Point-to-point messaging is generally used when a single message will be received by only one message consumer. There may be multiple consumers listening on the queue for the same message but only one of the consumers will receive it. Note that there can be multiple producers as well. They will be sending messages to the queue but it will be received by only one receiver.

A PTP model is based on the concept of sending a message to a named destination. This named destination is the message queue's endpoint that is listening to incoming messages over a port.

Typically, in the PTP model, a receiver requests a message that a sender sends to the queue, rather than subscribing to a channel and receiving all messages sent on a particular queue.

You can think of queues supporting PTP messaging models as FIFO queues. In such queues, messages are sorted in the order in which they were received, and as they are consumed, they are removed from the head of the queue. Queues such as Kafka maintain message offsets. Instead of deleting the messages, they increment the offsets for the receiver. Offset-based models provide better support for replaying messages.

The following figure shows an example model of PTP. Suppose there are two senders, S1 and S2, who send a message to a queue, Q1. On the other side, there are two receivers, R1 and R2, who receive a message from Q1. In this case, R1 will consume the message from S2 and R2 will consume the message from S1:

A graphical representation of how a point-to-point messaging model works

You can deduce the following important points about a PTP messaging system from the preceding figure:

More than one sender can produce and send messages to a queue. Senders can share a connection or use different connections, but they can all access the same queue.
More than one receiver can consume messages from a queue, but each message can be consumed by only one receiver. Thus, Message 1, Message 2, and Message 3 are consumed by different receivers. (This is a message queue extension.)
Receivers can share a connection or use different connections, but they can all access the same queue. (This is a message queue extension.)
Senders and receivers have no timing dependencies; the receiver can consume a message whether or not it was running when the sender produced and sent the message.
Messages are placed in a queue in the order they are produced, but the order in which they are consumed depends on factors such as message expiration date, message priority, whether a selector is used in consuming messages, and the relative message processing rate of the consumers.
Senders and receivers can be added and deleted dynamically at runtime, thus allowing the messaging system to expand or contract as needed.

The PTP messaging model can be further categorized into two types:

Fire-and-forget model
Request/reply model

In fire-and-forget processing, the producer sends a message to a centralized queue and does not wait for any acknowledgment immediately. It can be used in a scenario where you want to trigger an action or send a signal to the receiver to trigger some action that does not require a response. For example, you may want to use this method to send a message to a logging system, to alert a system to generate a report, or trigger an action to some other system. The following figure represents a fire-and-forget PTP messaging model:

Fire-and-forget message model

With an asynchronous request/reply PTP model, the message sender sends a message on one queue and then does a blocking wait on a reply queue waiting for the response from the receiver. The request/reply model provides for a high degree of decoupling between the sender and receiver, allowing the message producer and consumer components to be heterogeneous languages or platforms. The following figure represents a request/reply PTP messaging model:

Request/reply message model

Before concluding this section, it is important for you to understand where you can use the PTP model of messaging. It is used when you want one receiver to process any given message once and only once. This is perhaps the most critical difference: only one consumer will process a given message.

Another use case for point-to-point messaging is when you need synchronous communication between components that are written in different technology platforms or programming languages. For example, you may have an application written in a language, say PHP, which may want to communicate with a Twitter application written in Java to process tweets for analysis. In this scenario, a point-to-point messaging system helps provide interoperability between these cross-platform applications.

Building Data Streaming Applications with Apache Kafka

By : Chanchal Singh, Manish Kumar

Building Data Streaming Applications with Apache Kafka

By: Chanchal Singh, Manish Kumar

Overview of this book

Related Content you might be interested in

Current Title:

Building Data Streaming Applications with Apache Kafka