Benchmarking of a Hadoop cluster is the first step to tune the performance of a Hadoop cluster. We can also use Hadoop benchmarks to identify configuration problems and use it as reference for performance tuning. For example, by comparing the local benchmark with clusters with similar configurations, we can have a general understanding of the cluster performance.
Typically, we benchmark a Hadoop cluster after the cluster is newly configured and before putting it to service to accept jobs. This is because, when clients can submit jobs, the benchmarks can be perplexed by the client's jobs to show the real performance of a Hadoop cluster, and also the benchmark jobs can cause inconveniences for the clients.
In this section, we will introduce how to benchmark and stress test a Hadoop cluster using the tests
and examples
package included in the Hadoop distribution. More specifically, we will test the read/write performance of the HDFS cluster. In addition...