Every Hadoop cluster consists of different machines and different hardware. This means that each Hadoop installation should be optimized for its unique cluster setup. To ensure that your Hadoop is performing jobs efficiently, you need to check your cluster and identify potential bottlenecks in order to eliminate them.
This chapter presents some scenarios and techniques to identify cluster weaknesses. We will then introduce some formulas that will help to determine an optimal configuration for NameNodes and DataNodes. After that, you will learn how to configure your cluster correctly and how to determine the number of mappers and reducers for your cluster.
In this chapter, you will learn the following:
To check the cluster's weakness based on some scenarios
To identify CPU contention and inappropriate number of mappers and reducers
To identify massive I/O and network traffic
To size your cluster and define its sizing
To configure your cluster correctly