GridMix is a tool for benchmarking Hadoop clusters. It generates a number of synthetic MapReduce jobs and builds a model based on the performance of these jobs. Resource profiles of the cluster are modeled based on the job execution metrics. The profiles can help us find performance bottlenecks of the cluster. In this section, we will outline steps for benchmarking Hadoop with GridMix.
We assume that our Hadoop cluster has been properly configured and all the daemons are running without any issues.
Note
Currently, GridMix has three versions. For the purpose of differentiation and notation, we will use GridMix
to represent GridMix version 1, use GridMix2
to represent GridMix version 2, and use GridMix3
to represent GridMix version 3.
Log in to the Hadoop cluster node from the administrator machine using the following command:
ssh hduser@master