In this chapter, we have covered the basics of ELT, and Spark's ability to interact with a variety of sources including standard text, CSV, TSV, and JSON files. We moved on to look at accessing filesystems including local filesystems, HDFS, and S3. Finally, we spent some time on helping you understand access to a variety of NoSQL databases and the connectors available. As you can see, we have covered a few of the popular systems, but the massive open-source ecosystem around Spark means there are new connectors coming almost on a monthly basis. It is highly recommended to look closely at the project's GitHub page for the latest developments.
We'll now move on to the next chapter, where we are going to focus on Spark SQL, DataFrames, and Datasets. The next chapter is important as it builds on what we have covered already and helps us understand how Spark 2.0 abstracts developers from the relatively complex concept of RDD's by expanding on the already introduced concept of DataFrames...