In the first part of this chapter, we talked about how to load data in Spark from various data sources. We have seen code examples of connecting to some popular data sources such as HDFS, S3, and so on. In later parts, we discussed processing data in some widely used structured formats, along with the code examples.
In the next chapter, we will discuss Spark clusters in detail. We will discuss the cluster setup process and some popular cluster managers available with Spark in detail. Also, we will look at how to debug Spark applications in cluster mode.