Processing small data amounts in real time is not a challenge when we use Java Messaging Service (JMS), but, if we learn from the LinkedIn experience, we will see that these processing systems have serious performance limitations when dealing with large data volumes. Moreover, these systems are a nightmare when we try to scale horizontally, because they don't.
For this demo, we need a Kafka cluster up and running. Also, we need Spark installed on our machine and ready to be deployed.
Apache Spark has one utility class to create a data stream to be read from Kafka. As with any Spark project, we first need to create SparkConf
and the Spark StreamingContext
:
val sparkConf = new SparkConf().setAppName("SparkKafkaTest") val jssc = new JavaStreamingContext(sparkConf, Durations.seconds(10))
The JavaStreamingContext
is a Java friendly version of StreamingContext
which is the main entry point for Spark streaming functionality.
We create the Hashset...