-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Hadoop MapReduce v2 Cookbook - Second Edition: RAW - Second Edition
Hadoop contains several benchmarks that you can use to verify whether your HDFS cluster is set up properly and performs as expected. DFSIO is a benchmark test that comes with Hadoop, which can be used to analyze the I/O performance of an HDFS cluster. This recipe shows how to use DFSIO to benchmark the read/write performance of an HDFS cluster.
You must set up and deploy HDFS and Hadoop v2 YARN MapReduce prior to running these benchmarks. Locate the hadoop-mapreduce-client-jobclient-*-tests.jar file in your Hadoop installation.
The following steps will show you how to run the write and read DFSIO performance benchmarks:
–nrFiles parameter specifies the number of files to be written by the benchmark. Use a number high enough to saturate the task slots in your cluster. The -fileSize parameter specifies the file size of each file in MB. Change the location of the hadoop-mapreduce-client-jobclient-*-tests.jar file in the following commands according to your Hadoop installation.$ hadoop jar \ $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \ TestDFSIO -write -nrFiles 32 –fileSize 1000
TestDFSIO_results.log. You can provide your own result filename using the –resFile parameter.$hadoop jar \ $HADOOP_HOME/share/Hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \ TestDFSIO -read \ -nrFiles 32 –fileSize 1000
$hadoop jar \ $HADOOP_HOME/share/Hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*-tests.jar \ TestDFSIO -clean
DFSIO executes a MapReduce job where the Map tasks write and read the files in parallel, while the Reduce tasks are used to collect and summarize the performance numbers. You can compare the throughput and IO rate results of this benchmark with the total number of disks and their raw speeds to verify whether you are getting the expected performance from your cluster. Please note the replication factor when verifying the write performance results. High standard deviation in these tests may hint at one or more underperforming nodes due to some reason.
Running these tests together with monitoring systems can help you identify the bottlenecks of your Hadoop cluster much easily.
Change the font size
Change margin width
Change background colour