Book Image

Apache Kafka 1.0 Cookbook

By : Alexey Zinoviev, Raúl Estrada
Book Image

Apache Kafka 1.0 Cookbook

By: Alexey Zinoviev, Raúl Estrada

Overview of this book

Apache Kafka provides a unified, high-throughput, low-latency platform to handle real-time data feeds. This book will show you how to use Kafka efficiently, and contains practical solutions to the common problems that developers and administrators usually face while working with it. This practical guide contains easy-to-follow recipes to help you set up, configure, and use Apache Kafka in the best possible manner. You will use Apache Kafka Consumers and Producers to build effective real-time streaming applications. The book covers the recently released Kafka version 1.0, the Confluent Platform and Kafka Streams. The programming aspect covered in the book will teach you how to perform important tasks such as message validation, enrichment and composition.Recipes focusing on optimizing the performance of your Kafka cluster, and integrate Kafka with a variety of third-party tools such as Apache Hadoop, Apache Spark, and Elasticsearch will help ease your day to day collaboration with Kafka greatly. Finally, we cover tasks related to monitoring and securing your Apache Kafka cluster using tools such as Ganglia and Graphite. If you're looking to become the go-to person in your organization when it comes to working with Apache Kafka, this book is the only resource you need to have.
Table of Contents (18 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Dedication
Preface

Configuring Kafka brokers


This recipe shows how to deal with the Kafka brokers' basic configuration. For learning and development purposes, one can run Kafka in standalone mode. The real Kafka power is unlocked when it is running with replication in cluster mode and the topics are partitioned accordingly.

There are two main advantages of the cluster mode: parallelism and redundancy. Parallelism is the capacity to run tasks simultaneously among the cluster members. Redundancy warrants that when a Kafka node goes down, the cluster is safe and accessible from the other nodes.

Single node clusters are not recommended for production environments, so this recipe shows how to configure a cluster with several nodes.

Getting ready

Go to the Kafka installation directory (/usr/local/kafka/ for Mac users and /opt/kafka/ for Linux users):

> cd /usr/local/kafka

How to do it...

As already said, a broker is a server's instance. This recipe shows how to start two different servers on one machine. There is a server configuration template called server.properties located in the Kafka installation directory in the config sub-directory:

  1. For each Kafka broker (server) that we want to run, we make a copy of the configuration file template and rename it accordingly. In this example, the cluster is called synergy:
> cp config/server.properties synergy-1.properties
> cp config/server.properties synergy-2.properties
  1. Modify each file according to the plan. If the file is called synergy-1, the broker.id should be 1. Specify the port in which the server should run; the recommendation is 9093 for synergy-1 and 9094 for synergy-2. The port property is not set in the template, so add the line accordingly. Finally, specify the location of the Kafka logs (specific archives to store all the Kafka broker operations); in this case, we use the directory /tmp.

In synergy-1.properties, set:

broker.id=1
port=9093
log.dir=/tmp/synergy-1-logs

In synergy-2.properties, set:

broker.id=2
port=9094
log.dir=/tmp/synergy-2-logs
  1. Start the Kafka brokers using the kafka-server-start.sh command with the corresponding configuration file. Don't forget that Zookeeper must be already running with its corresponding Kafka node and the ports should not be in use by another process:
> bin/kafka-server-start.sh synergy-1.properties &
...
> bin/kafka-server-start.sh synergy-2.properties &

Recall that trailing & is to specify that you want your command line back. If you want to see the broker output, it is recommended that you run each command in its own command-line window.

How it works...

The properties file contains the server configuration. The server.properties file located in the config directory is just a template.

All of the members of the cluster should point to the same Zookeeper cluster. Every broker is identified inside the cluster by the name specified in the broker.id property. If the port property is not specified, Zookeeper will assign the same port number and will overwrite the data. If log.dir is not specified, all the brokers will write to the same default log.dir. If the brokers are going to run on different machines, then port and log.dir might not be specified.

There's more...

Before assigning a port to a server, there is a useful command to see what process is running on a specific port (in this case, the port 9093):

> lsof -i :9093

The output of the previous command is something like this:

COMMAND   PID   USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
java   17479 admin   97u IPv6 0xcfbcde96aa59c3bf     0t0 TCP *:9093 (LISTEN)

Try to run this command before starting the Kafka servers and run it after starting to see the change. Also, try to start a broker on a port in use to see how it fails.

To run Kafka nodes on different machines, change the ZooKeeper connection string in the configuration file; its default value is:

zookeeper.connect=localhost:2181

This value is correct only if you are running the Kafka broker on the same machine as Zookeeper. In production, it could not happen. To specify that ZooKeeper is running on different machines (that is, in a ZooKeeper cluster), set:

zookeeper.connect=localhost:2181, 192.168.0.2:2183, 192.168.0.3:2182

The previous line says that Zookeeper is running on the localhost machine on port 2181, on the machine with IP Address 192.168.0.2 on port 2183, and on the machine with IP Address 192.168.0.3 on port 2182. The Zookeeper default port is 2181, so try to run it there.

As an exercise, try to raise a broker with incorrect information about Zookeeper. Also, in combination with the lsof command, try to raise Zookeeper on a port in use.

See also