In this chapter, we will cover:
Optimizing Hadoop YARN and MapReduce configurations for cluster deployments
Shared user Hadoop clusters – using Fair and Capacity schedulers
Setting classpath precedence to user-provided JARs
Speculative execution of straggling tasks
Unit testing Hadoop MapReduce applications using MRUnit
Integration testing Hadoop MapReduce applications using MiniYarnCluster
Adding a new DataNode
Decommissioning DataNodes
Using multiple disks/volumes and limiting HDFS disk usage
Setting the HDFS block size
Setting the file replication factor
Using the HDFS Java API