Book Image

Building Data Streaming Applications with Apache Kafka

By : Chanchal Singh, Manish Kumar
Book Image

Building Data Streaming Applications with Apache Kafka

By: Chanchal Singh, Manish Kumar

Overview of this book

Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur. This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications, and tackles some common challenges such as how to use Kafka efficiently and handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security. By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka, and to design efficient streaming data applications with it.
Table of Contents (14 chapters)

Understanding the principles of messaging systems

Continuing our focus on messaging systems, you may have seen applications where one application uses data that gets processed by other external applications or applications consuming data from one or more data sources. In such scenarios, messaging systems can be used as an integration channel for information exchange between different applications. If you haven't built such an application yet, then don't worry about it. We will build it in upcoming chapters.

In any application integration system design, there are a few important principles that should be kept in mind, such as loose coupling, common interface definitions, latency, and reliability. Let's look into some of these one by one:

  • Loose coupling between applications ensures minimal dependencies on each other. This ensures that any changes in one application do not affect other applications. Tightly coupled applications are coded as per predefined specifications of other applications. Any change in specification would break or change the functionality of other dependent applications.
  • Common interface definitions ensure a common agreed-upon data format for exchange between applications. This not only helps in establishing message exchange standards among applications but also ensures that some of the best practices of information exchange can be enforced easily. For example, you can choose to use the Avro data format to exchange messages. This can be defined as your common interface standard for information exchange. Avro is a good choice for message exchanges as it serializes data in a compact binary format and supports schema evolution.
  • Latency is the time taken by messages to traverse between the sender and receiver. Most applications want to achieve low latency as a critical requirement. Even in an asynchronous mode of communication, high latency is not desirable as significant delay in receiving messages could cause significant loss to any organization.
  • Reliability ensures that temporary unavailability of applications does not affect dependent applications that need to exchange information. In general, the when source application sends a message to the remote application, sometimes the remote application may be running slow or it may not be running due to some failure. Reliable, asynchronous message communication ensures that the source application continues its work and feels confident that the remote application will resume its task later.