Book Image

Learning Apache Kafka - Second Edition

By : Nishant Garg
Book Image

Learning Apache Kafka - Second Edition

By: Nishant Garg

Overview of this book

<p>Kafka is one of those systems that is very simple to describe at a high level but has an incredible depth of technical detail when you dig deeper.</p> <p>Learning Apache Kafka Second Edition provides you with step-by-step, practical examples that help you take advantage of the real power of Kafka and handle hundreds of megabytes of messages per second from multiple clients. This book teaches you everything you need to know, right from setting up Kafka clusters to understanding basic blocks like producer, broker, and consumer blocks. Once you are all set up, you will then explore additional settings and configuration changes to achieve ever more complex goals. You will also learn how Kafka is designed internally and what configurations make it more effective. Finally, you will learn how Kafka works with other tools such as Hadoop, Storm, and so on.</p>
Table of Contents (14 chapters)

Integration with other tools


This section discusses the contributions by many contributors providing integration with Apache Kafka for various needs such as logging, packaging, cloud integration, and Hadoop integration.

Camus (https://github.com/linkedin/camus) which provides a pipeline from Kafka to HDFS. Under this project, a single MapReduce job performs the following steps to load data to HDFS in a distributed manner:

  1. As a first step, it discovers the latest topics and partition offsets from ZooKeeper.

  2. Each task in the MapReduce job fetches events from the Kafka broker and commits the pulled data along with the audit count to the output folders.

  3. After the completion of the job, final offsets are written to HDFS and can be further consumed by subsequent MapReduce jobs.

  4. Information about the consumed messages is also updated in the Kafka cluster.

Some other useful contributions are: