In this section, we will be covering text data, but in a tabular format—CSV. The following topics will be covered:
- Saving data in CSV format
- Loading CSV data
- Testing
Saving CSV files is even more involved than JSON and plain text because we need to specify whether we want to retain headers of our data in our CSV file.
First, we will create a DataFrame:
test("should save and load CSV with header") {
//given
import spark.sqlContext.implicits._
val rdd = spark.sparkContext
.makeRDD(List(UserTransaction("a", 100), UserTransaction("b", 200)))
.toDF()
Then, we will use the write format CSV. We also need to specify that we don't want to include the header option in it:
//when
rdd.coalesce(1)
.write
.format("csv")
.option("header", "false")
.save(FileName)
We will then perform...