In this section, we will use the Dataset API in an immutable way. We will cover the following topics:
- Dataset immutability
- Creating two leaves from the one root dataset
- Adding a new column by issuing transformation
The test case for the dataset is quite similar, but we need to do a toDS() for our data to be type safe. The type of dataset is userData, as shown in the following example:
import com.tomekl007.UserData
import org.apache.spark.sql.SparkSession
import org.scalatest.FunSuite
class ImmutableDataSet extends FunSuite {
val spark: SparkSession = SparkSession
.builder().master("local[2]").getOrCreate()
test("Should use immutable DF API") {
import spark.sqlContext.implicits._
//given
val userData =
spark.sparkContext.makeRDD(List(
UserData("a", "1"),
UserData("b", "2"),
UserData...