Apache Kafka Quick Start Guide

By : Raúl Estrada

Apache Kafka Quick Start Guide

By: Raúl Estrada

Overview of this book

Apache Kafka is a great open source platform for handling your real-time data pipeline to ensure high-speed filtering and pattern matching on the ?y. In this book, you will learn how to use Apache Kafka for efficient processing of distributed applications and will get familiar with solving everyday problems in fast data and processing pipelines. This book focuses on programming rather than the configuration management of Kafka clusters or DevOps. It starts off with the installation and setting up the development environment, before quickly moving on to performing fundamental messaging operations such as validation and enrichment. Here you will learn about message composition with pure Kafka API and Kafka Streams. You will look into the transformation of messages in different formats, such asext, binary, XML, JSON, and AVRO. Next, you will learn how to expose the schemas contained in Kafka with the Schema Registry. You will then learn how to work with all relevant connectors with Kafka Connect. While working with Kafka Streams, you will perform various interesting operations on streams, such as windowing, joins, and aggregations. Finally, through KSQL, you will learn how to retrieve, insert, modify, and delete data streams, and how to manipulate watermarks and windows.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Configuring Kafka

Kafka in a nutshell

Kafka installation

Running Kafka

Running Confluent Platform

Running Kafka brokers

Running Kafka topics

A command-line message producer

A command-line message consumer

Using kafkacat

Summary

Message Validation

Enterprise service bus in a nutshell

Event modeling

Setting up the project

Reading from Kafka

Writing to Kafka

Running the processing engine

Coding a validator in Java

Running the validation

Summary

Message Enrichment

Extracting the geographic location

Enriching the messages

Extracting the currency price

Enriching with currency price

Running the engine

Extracting the weather data

Summary

Serialization

Kioto, a Kafka IoT company

Running the PlainProducer

Java plain consumer

Java PlainProcessor

Running the PlainProcessor

Custom serializer

Java CustomProducer

Running the CustomProducer

Custom deserializer

Java custom consumer

Java custom processor

Running the custom processor

Summary

Schema Registry

Avro in a nutshell

Defining the schema

Starting the Schema Registry

Using the Schema Registry

Java AvroProducer

Running the AvroProducer

Java AvroConsumer

Java AvroProcessor

Running the AvroProcessor

Summary

Kafka Streams

Kafka Streams in a nutshell

Project setup

Java PlainStreamsProcessor

Running the PlainStreamsProcessor

Scaling out with Kafka Streams

Java CustomStreamsProcessor

Running the CustomStreamsProcessor

Java AvroStreamsProcessor

Running the AvroStreamsProcessor

Late event processing

Basic scenario

Late event generation

Running the EventProducer

Kafka Streams processor

Running the Streams processor

Stream processor analysis

Summary

KSQL

KSQL in a nutshell

Running KSQL

Using the KSQL CLI

Processing data with KSQL

Writing to a topic

Summary

Kafka Connect

Kafka Connect in a nutshell

Project setup

Spark Streaming processor

Reading Kafka from Spark

Data conversion

Data processing

Writing to Kafka from Spark

Running the SparkProcessor

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Kafka installation

There are three ways to install a Kafka environment:

Downloading the executable files
Using brew (in macOS) or yum (in Linux)
Installing Confluent Platform

For all three ways, the first step is to install Java; we need Java 8. Download and install the latest JDK 8 from the Oracle's website:

http://www.oracle.com/technetwork/java/javase/downloads/index.html

At the time of writing, the latest Java 8 JDK version is 8u191.

For Linux users :

Change the file mode to executable as follows, follows these steps:

      > chmod +x jdk-8u191-linux-x64.rpm

Go to the directory in which you want to install Java:

      > cd <directory path>

Run the rpm installer with the following command:

      > rpm -ivh jdk-8u191-linux-x64.rpm

Add to your environment the JAVA_HOME variable. The following command writes the JAVA_HOME environment variable to the /etc/profile file:

      > echo "export JAVA_HOME=/usr/java/jdk1.8.0_191" >> /etc/profile

Validate the Java installation as follows:

      > java -version
      java version "1.8.0_191"
      Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
      Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

At the time of writing, the latest Scala version is 2.12.6. To install Scala in Linux, perform the following steps:

Download the latest Scala binary from http://www.scala-lang.org/download
Extract the downloaded file, scala-2.12.6.tgz, as follows:

      > tar xzf scala-2.12.6.tgz

Add the SCALA_HOME variable to your environment as follows:

      > export SCALA_HOME=/opt/scala

Add the Scala bin directory to your PATH environment variable as follows:

      > export PATH=$PATH:$SCALA_HOME/bin

To validate the Scala installation, do the following:

      >  scala -version
      Scala code runner version 2.12.6 -- Copyright 2002-2018,
      LAMP/EPFL and Lightbend, Inc.

To install Kafka on your machine, ensure that you have at least 4 GB of RAM, and the installation directory will be /usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users. Create these directories according to your operating system.

Kafka installation on Linux

Open the Apache Kafka download page, http://kafka.apache.org/downloads, as in Figure 1.1:

Figure 1.1: Apache Kafka download page

At the time of writing, the current Apache Kafka version is 2.0.0 as a stable release. Remember that, since version 0.8.x, Kafka is not backward-compatible. So, we cannot replace this version for one prior to 0.8. Once you've downloaded the latest available release, let's proceed with the installation.

Remember for macOS users, replace the directory /opt/ with /usr/local.

Follow these steps to install Kafka in Linux:

Extract the downloaded file, kafka_2.11-2.0.0.tgz, in the /opt/ directory as follows:

      > tar xzf kafka_2.11-2.0.0.tgz

Create the KAFKA_HOME environment variable as follows:

      > export KAFKA_HOME=/opt/kafka_2.11-2.0.0

Add the Kafka bin directory to the PATH variable as follows:

      > export PATH=$PATH:$KAFKA_HOME/bin

Now Java, Scala, and Kafka are installed.

To do all of the previous steps from the command line, there is a powerful tool for macOS users called brew (the equivalent in Linux would be yum).

Kafka installation on macOS

To install from the command line in macOS (brew must be installed), perform the following steps:

To install sbt (the Scala build tool) with brew, execute the following:

      > brew install sbt

If already have it in your environment (downloaded previously), run the following to upgrade it:

      > brew upgrade sbt

The output is similar to that shown in Figure 1.2:

Figure 1.2: The Scala build tool installation output

To install Scala with brew, execute the following:

      > brew install scala

If you already have it in your environment (downloaded previously), to upgrade it, run the following command:

      > brew upgrade scala

The output is similar to that shown in Figure 1.3:

Figure 1.3: The Scala installation output

To install Kafka with brew, (it also installs Zookeeper), do the following:

      > brew install kafka

If you already have it (downloaded in the past), upgrade it as follows:

      > brew upgrade kafka

The output is similar to that shown in Figure 1.4:

Figure 1.4: Kafka installation output

Visit https://brew.sh/ for more about brew.

Confluent Platform installation

The third way to install Kafka is through Confluent Platform. In the rest of this book, we will be using Confluent Platform open source version.

Confluent Platform is an integrated platform that includes the following components:

Apache Kafka
REST proxy
Kafka Connect API
Schema Registry
Kafka Streams API
Pre-built connectors
Non-Java clients
KSQL

If the reader notices, almost every one of the components has its own chapter in this book.

The commercially licensed Confluent Platform includes, in addition to all of the components of the open source version, the following:

Confluent Control Center (CCC)
Kafka operator (for Kubernetes)
JMS client
Replicator
MQTT proxy
Auto data balancer
Security features

It is important to mention that the training on the components of the non-open source version is beyond the scope of this book.

Confluent Platform is available also in Docker images, but here we are going to install it in local.

Open Confluent Platform download page: https://www.confluent.io/download/ .

At the time of this writing, the current version of Confluent Platform is 5.0.0 as a stable release. Remember that, since the Kafka core runs on Scala, there are two versions: for Scala 2.11 and Scala 2.12.

We could run Confluent Platform from our desktop directory, but following this book's conventions, let's use /opt/ for Linux users and /usr/local for macOS users.

To install Confluent Platform, extract the downloaded file, confluent-5.0.0-2.11.tar.gz, in the directory, as follows:

> tar xzf confluent-5.0.0-2.11.tar.gz

Apache Kafka Quick Start Guide

By : Raúl Estrada

Apache Kafka Quick Start Guide

By: Raúl Estrada

Overview of this book

Related Content you might be interested in

Current Title:

Apache Kafka Quick Start Guide

Building Data Streaming Applications with Apache Kafka

Microservices Deployment Cookbook

Machine Learning with Apache Spark Quick Start Guide