Transformations shape your dataset. These include mapping, filtering, joining, and transcoding the values in your dataset. In this section, we will showcase some of the transformations available on RDDs.
Note
Due to space constraints we include only the most often used transformations and actions here. For a full set of methods available we suggest you check PySpark's documentation on RDDs http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.
Since RDDs are schema-less, in this section we assume you know the schema of the produced dataset. If you cannot remember the positions of information in the parsed dataset we suggest you refer to the definition of the extractInformation(...)
method on GitHub, code for Chapter 03
.