The Spark programming paradigm has many abstractions to choose from when it comes to developing data processing applications. The fundamentals of Spark programming start with RDDs that can easily deal with unstructured, semi-structured, and structured data. The Spark SQL library offers highly optimized performance when processing structured data. This makes the basic RDDs look inferior in terms of performance. To fill this gap, from Spark 1.6 onwards, a new abstraction, named Dataset, was introduced that complements the RDD-based Spark programming model. It works pretty much the same way as RDD when it comes to Spark transformations and Spark actions, and at the same time, it is highly optimized like the Spark SQL. Dataset API provides strong compile-time type safety when it comes to writing programs and, because of that, the Dataset API is available only in Scala and Java.
The transaction banking use case discussed in the chapter covering the Spark programming model...