In this section, we will reuse the same rdd for different actions. First, we will minimize the execution time by reusing the rdd. We will then look at caching and a performance test for our code.
The following example is the test from the preceding section but a bit modified, as here we take start by currentTimeMillis() and the result. So, we are just measuring the result of all actions that are executed:
//then every call to action means that we are going up to the RDD chain
//if we are loading data from external file-system (I.E.: HDFS), every action means
//that we need to load it from FS.
val start = System.currentTimeMillis()
println(rdd.collect().toList)
println(rdd.count())
println(rdd.first())
rdd.foreach(println(_))
rdd.foreachPartition(t => t.foreach(println(_)))
println(rdd.max())
println(rdd.min(...