-
Book Overview & Buying
-
Table Of Contents
Fast Data Processing with Spark 2 - Third Edition
By :
While distributed computational jobs are a lot of fun, they are much more useful when the results are stored in a useful place. While the methods for loading an RDD are largely found in the SparkContext class, the methods for saving an RDD are defined on the RDD classes. In Scala, implicit conversions exist so that an RDD, which can be saved as a sequence file, could be converted to the appropriate type; in Java, explicit conversions must be used.
Here are the different ways to save an RDD.
Here's the code for Scala:
rddOfStrings.saveAsTextFile("out.txt")
keyValueRdd.saveAsObjectFile("sequenceOut")
Here's the code for Java:
rddOfStrings.saveAsTextFile("out.txt")
keyValueRdd.saveAsObjectFile("sequenceOut")
Here's the code for Python:
rddOfStrings.saveAsTextFile("out.txt")
In addition, users can save the RDD as a compressed text file using the following function:
saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec])