A Hadoop cluster contains two types of nodes: a master node and a slave node. By default, the NameNode, SecondaryNameNode, and JobTracker daemons reside on a master node, and DataNode and TaskTracker daemons reside on slave nodes. Properly selecting hardware for these computing and storage nodes can maximize the efficiency of a Hadoop cluster. In this recipe, we will list suggestions on hardware selection for a computing node.
Although special requirements exist for a master node and a slave node, there is no gold standard for choosing optimal hardware for both types of nodes. It is reasonable to say that the hardware configuration is closely related to the properties of Big Data to be processed. In addition, the choice of hardware is an empirical and adaptive process with the changing requirements on a Hadoop cluster. For example, if the requirements for the throughput of a Hadoop cluster are high, we might need to choose high-end CPUs and...