In this chapter, we touched upon the supported programming languages, their advantages and when to choose one language over the other. We discussed the design of the Spark engine along with its core components and their execution mechanism. We saw how Spark sends the data to be computed across many cluster nodes. We then discussed some RDD concepts. We learnt how to create RDDs and perform transformations and actions on them through both Scala and Python. We also discussed some advanced operations on RDDs.
In the next chapter, we will learn about DataFrames in detail and how they justify their suitability for all sorts of data science requirements.