As explained in Chapter 8, Integration of Storm and Kafka, Kafka is a distributed messaging queue and can integrate very well with Storm. In this section, we will show you how we can use Logstash to read the Apache log file and publish it into the Kafka Cluster. We are assuming you already have the Kafka Cluster running. The installation steps of the Kafka Cluster are outlined in Chapter 8, Integration of Storm and Kafka.
Before moving on to the installation of Logstash, we are going to answer the questions: What is Logstash? Why are we using Logstash?
Logstash is a tool that is used to collect, filter/parse, and emit the data for future use. Collect, parse, and emit are divided into three sections, which are called input, filter, and output:
- The input section is used to read the data from external sources. The common input sources are File, TCP port, Kafka, and so on.
- The filter section is used to parse the...