Book Image

Optimizing Hadoop for MapReduce

By : Khaled Tannir
Book Image

Optimizing Hadoop for MapReduce

By: Khaled Tannir

Overview of this book

Table of Contents (15 chapters)

Reusing types smartly


Often, Hadoop problems are caused by some form of memory mismanagement and nodes don't suddenly fail but experience slowdown as I/O devices go bad. Hadoop has many options for controlling memory allocation and usage at several levels of granularity, but it does not check these options. So, it is possible for the combined heap size for all the daemons on a machine to exceed the amount of physical memory.

Each Java process itself has a configured maximum heap size. Depending on whether the JVM heap size, OS limit, or physical memory is exhausted first, this will cause an out-of-memory error, a JVM abort, or severe swapping, respectively.

You should pay attention to memory management. All unnecessarily allocated memory resources should be removed to maximize memory space for the MapReduce jobs.

Reusing types is a technique for minimizing resources usage such as CPU and memory space. When you deal with millions of data records, it is always cheaper to reuse an existing instance...