For the sake of simplicity, this chapter will look at using ScalaTest and JUnit as the testing libraries. ScalaTest can be used to test both Scala and Java code and is the testing library currently used in Spark. JUnit is a popular testing framework for Java.
If you have code that can be isolated from the RDD interaction or SparkContext interaction, this code can be tested using standard methodologies. While it can be quite convenient to use anonymous functions when writing Spark code, by giving them names, you can test them more easily without having to deal with the expensive overhead of setting up SparkContext. For example, in your Scala CSV parser, you could had this hard to test code:
val splitLines = inFile.map(line => { val reader = new CSVReader(new StringReader(line)) reader.readNext().map(_.toDouble) }
Or in Java you had:
JavaRDD<Integer[]> splitLines = inFile.flatMap(new FlatMapFunction<String...