A Spark application is made of up of Driver and Executor(s) processes. Each application in Spark contains one driver process and one or more executor processes. The driver is the central coordinator of the application that drives the application. Spark Driver communicates and divides work among one or more executors. In distributed mode, Spark driver and each executor runs in separate JVM.
Logical Representation of a Spark Application in Distributed Mode
SparkContext is initialized in the Driver JVM. Spark driver can be considered as the master of Spark applications. The following are the responsibilities of Spark Driver program:
- It creates the physical plan of execution of tasks based on the DAG of operations.
- It schedules the tasks on the executors. It passes the task bundle to executors based. Data locality principle is used while passing the tasks to executors.
- Spark driver tracks RDD partitions to executor mapping for executing future tasks...