Before moving any further let's first understand the common terminologies associated with Spark:
- Driver: This is the main program that oversees the end-to-end execution of a Spark job or program. It negotiates the resources with the resource manager of the cluster for delegate and orchestrate the program into smallest possible data local parallel programming unit.
- Executors: In any Spark job, there can be one or more executors, that is, processes that execute smaller tasks delegated by the driver. The executors process the data, preferably local to the node and store the result in memory, disk, or both.
- Master: Apache Spark has been implemented in master-slave architecture and hence master refers to the cluster node executing the driver program.
- Slave: In a distributed cluster mode, slave refers to the nodes on which executors are being run and hence there can be (and mostly is) more than one slave in the cluster.
- Job: This is a collection of operations performed on any set of...