Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Streaming using Kafka


Kafka is a distributed, partitioned, and replicated commit log service. In simple words, it is a distributed messaging server. Kafka maintains the message feed in categories called topics. An example of the topic can be a ticker symbol of a company you would like to get news about, for example, CSCO for Cisco.

Processes that produce messages are called producers and those that consume messages are called consumers. In traditional messaging, the messaging service has one central messaging server, also called broker. Since Kafka is a distributed messaging service, it has a cluster of brokers, which functionally act as one Kafka broker, as shown here:

For each topic, Kafka maintains the partitioned log. This partitioned log consists of one or more partitions spread across the cluster, as shown in the following figure:

Kafka borrows a lot of concepts from Hadoop and other big data frameworks. The concept of partition is very similar to the concept of InputSplit in Hadoop...