Book Image

Apache Kafka

By : Nishant Garg
Book Image

Apache Kafka

By: Nishant Garg

Overview of this book

<p>Message publishing is a mechanism of connecting heterogeneous applications together with messages that are routed between them, for example by using a message broker like Apache Kafka. Such solutions deal with real-time volumes of information and route it to multiple consumers without letting information producers know who the final consumers are.</p> <p>Apache Kafka is a practical, hands-on guide providing you with a series of step-by-step practical implementations, which will help you take advantage of the real power behind Kafka, and give you a strong grounding for using it in your publisher-subscriber based architectures.</p> <p>Apache Kafka takes you through a number of clear, practical implementations that will help you to take advantage of the power of Apache Kafka, quickly and painlessly. You will learn everything you need to know for setting up Kafka clusters. This book explains how Kafka basic blocks like producers, brokers, and consumers actually work and fit together. You will then explore additional settings and configuration changes to achieve ever more complex goals. Finally you will learn how Kafka works with other tools like Hadoop, Storm, and so on.</p> <p>You will learn everything you need to know to work with Apache Kafka in the right format, as well as how to leverage its power of handling hundreds of megabytes of messages per second from multiple clients.</p>
Table of Contents (15 chapters)

Integration with other tools


This section discusses the contributions by many contributors, providing integration with Apache Kafka for various needs such as logging, packaging, cloud integration, and Hadoop integration.

Camus (https://github.com/linkedin/camus) is another art of work done by LinkedIn, which provides a pipeline from Kafka to HDFS. Under this project, a single MapReduce job performs the following steps for loading data to HDFS in a distributed manner:

  1. As a first step, it discovers the latest topics and partition offsets from ZooKeeper.

  2. Each task in the MapReduce job fetches events from the Kafka broker and commits the pulled data along with the audit count to the output folders.

  3. After the completion of the job, final offsets are written to HDFS, which can be further consumed by subsequent MapReduce jobs.

  4. Information about the consumed messages is also updated in the Kafka cluster.

Some other useful contributions are:

  • Automated deployment and configuration of Kafka and ZooKeeper...