The Spark Streaming application works as the listener application that receives the data from its producers. Since Kafka is going to be used as the message broker, the Spark Streaming application will be its consumer application, listening to the topics for the messages sent by its producers. Since the master dataset in the batch layer has the following datasets, it is ideal to have individual Kafka topics for each of the topics, along with the datasets.
User dataset: User
Follower dataset: Follower
Message dataset: Message
Figure 5 provides an overall picture of the Kafka-based Spark Streaming application structure:
Since the Kafka setup has already been covered in Chapter 6, Spark Stream Processing, only the application code is covered here.
The following scripts are run from a terminal window. Make sure that the $KAFKA_HOME
environment variable is pointing to the directory where Kafka is installed. Also, it is very important to start Zookeeper, the...