Hadoop CapacityScheduler is a pluggable MapReduce job scheduler. The goal is to maximize the Hadoop cluster utilization by sharing the cluster among multiple users. CapacityScheduler uses queues to guarantee the minimum share of each user. It has features of being secure, elastic, operable, and supporting job priority. In this recipe, we will outline steps to configure CapacityScheduler for a Hadoop cluster.
We assume that our Hadoop cluster has been properly configured and all the daemons are running without any issues.
Log in to the master node from the cluster administrator machine using the following command:
ssh hduser@master
Configure CapacityScheduler with the following steps:
Configure Hadoop to use CapacityScheduler by adding the following lines into the file
$HADOOP_HOME/conf/mapred-site.xml
:<property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.CapacityTaskScheduler...