Note that Spark shell is available only in the Scala language. However, we have kept examples easy to understand by Java developers.
Execute the following command to check the Spark version using spark-shell
:
scala>sc.version res0: String = 2.1.1
It is shown in the following screenshot:
Let's start by creating an RDD of strings:
scala>val stringRdd=sc.parallelize(Array("Java","Scala","Python","Ruby","JavaScript","Java")) stringRdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:24
Now, we will filter this RDD to keep only those strings that start with the letter J
:
scala>valfilteredRdd = stringRdd.filter(s =>s.startsWith("J")) filteredRdd: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:26
In the first chapter, we learnt that if an operation on RDD returns an RDD then it is a transformation, or else it is an action.
The...