The configuration of a Hadoop cluster is a systematic project, especially, due to its large scale and distributed property. Efforts are needed in choosing the proper storage and computing hardware, designing the interconnected network, installing and configuring the operating system, and so on.
In a Hadoop cluster, different types of nodes may require different hardware configurations. For example, the JobTracker on a master node schedules jobs and assigns tasks to proper slave nodes for execution, and the NameNode on the master node manages the metadata for files and data blocks. In addition, the master node is a critical failure point in a default cluster configuration, which configures only one master node. A critical requirement for the master node is to be responsive and reliable. On the other hand, a slave node is responsible for hosting data blocks and running tasks upon the data blocks. Because of the built-in cluster-level-failure resilience, the reliability requirement...