In this chapter, we will cover:
Benchmarking and profiling a Hadoop cluster
Analyzing job history with Rumen
Benchmarking a Hadoop cluster with GridMix
Using Hadoop Vaidya to identify performance problems
Balancing data blocks for a Hadoop cluster
Choosing a proper block size
Using compression for input and output
Configuring speculative execution
Setting proper number of map and reduce slots for TaskTracker
Tuning the JobTracker configuration
Tuning the TaskTracker configuration
Tuning shuffle, merge, and sort parameters
Configuring memory for a Hadoop cluster
Setting proper number of parallel copies
Tuning JVM parameters
Configuring JVM Reuse
Configuring the reducer initialization time