Hadoop performance tuning is a challenging task, mainly due to the distributed feature of the system. The sheer number of configuration properties can tell, from another perspective, how complicated it is to configure a Hadoop cluster. Many of the configuration parameters have an effect on the performance of the cluster. Sometimes, different settings of the properties can lead to dramatic performance differences. And some properties are more relevant and sensitive than others with regard to the performance of a Hadoop cluster.
A Hadoop cluster is composed of many components. A systematic way of performance tuning is to tune the components based on their contribution on the cluster performance. Most of the Big Data applications are I/O bound, so is Hadoop. So, configurations that are closely related to I/O requests should be the first priority for performance tuning. For example, suboptimal configuration on data replication properties can cause a large number of data block copies...