One of the unique features of Spark is persisting RDDs in memory. You can persist an RDD with persist or cache transformations as shown in the following:
>>> myRDD.cache() >>> myRDD.persist()
Both the preceding statements are the same and cache data at the MEMORY_ONLY
storage level. The difference is cache refers to the MEMORY_ONLY
storage level, whereas persist can choose different storage levels as needed, as shown in the following table. The first time it is computed with an action, it will be kept in memory on the nodes. The easiest way to know the percentage of the cached RDD and its size is to check the Storage tab in the UI as shown in Figure 3.11: