Book Image

Optimizing Hadoop for MapReduce

By : Khaled Tannir
Book Image

Optimizing Hadoop for MapReduce

By: Khaled Tannir

Overview of this book

Table of Contents (15 chapters)

Using compression


Compression reduces the number of bytes read from or written to the underlying storage system (HDFS). Compression enhances efficiency of network bandwidth and disk space. Using data compression is important in Hadoop especially in a very large data context and under intensive workloads. In such a context, I/O operations and network data transfers take a considerable amount of time to complete. Moreover, the Shuffle and Merge process will also be under huge I/O pressure.

Because disk I/O and network bandwidth are precious resources in Hadoop, data compression is helpful to save these resources and minimize I/O disk and network transfer. Achieving increased performance and saving these resources is not free, although it is done with low CPU costs while compressing and decompressing operations.

Whenever I/O disk or network traffic affects your MapReduce job performance, you can improve the end-to-end processing time and reduce I/O and network traffic by enabling compression...