Spark programming revolves around RDDs. In any Spark application, the input data to be processed is taken to create an appropriate RDD. To begin with, start with the most basic way of creating an RDD, which is from a list. The input data used for this hello world
kind of application is a small collection of retail banking transactions. To explain the core concepts, only some very elementary data items have been picked up. The transaction records contain account numbers and transaction amounts.
Tip
In these use cases and all the upcoming use cases in the book, if the term record is used, that will be in the business or use case context.
The use cases selected for elucidating the Spark transformations and Spark actions here are given as follows:
The transaction records are coming as comma-separated values.
Filter out only the good transaction records from the list. The account number should start with
SB
and the transaction amount should be greater than zero...