We start developing the Spark Streaming application by creating a SparkConf
followed by a StreamingContext
:
val conf = new SparkConf(false) .setMaster("local[*]") .setAppName("Spark Streaming with Akka") .set("spark.logConf", "true") .set("spark.driver.port", s"$driverPort") .set("spark.driver.host", s"$driverHost") .set("spark.akka.logLifecycleEvents", "true") val ssc = new StreamingContext(conf, Seconds(1))
This gives us a context to access the actor system that is of the type ReceiverInputDStream
:
val actorName = "salutator" val actorStream: ReceiverInputDStream[String] = ssc.actorStream[String](Props[Salutator], actorName)
Now that we have a DStream, let's define a high-level processing pipeline in Spark Streaming:
actorStream.print()
In the preceding case, the print()
method is going to print the first 10 elements of each RDD generated in this DStream. Nothing happens until start()
is executed...