Book Image

Optimizing Hadoop for MapReduce

By : Khaled Tannir
Book Image

Optimizing Hadoop for MapReduce

By: Khaled Tannir

Overview of this book

Table of Contents (15 chapters)

Using appropriate Writable types


Hadoop uses custom datatype serialization/RPC mechanism and defines its own box type classes. These classes are used to manipulate strings (Text), integers (IntWritable), and so on, and they implement the Writable class, which defines a deserialization protocol.

Therefore, all values in Hadoop are Writable type objects and all keys are instances of WritableComparable, which defines a sort order, because they need to be compared.

Writable objects are mutable and considerably more compact as no meta info needs to be stored (class name, fields, super classes, and so on), and straightforward random access gives higher performance. As binary Writable types will take up less space, this will reduce the size of intermediate data written by the Map or Combiner function. Reducing intermediate data can provide a substantial performance gain by reducing network transfer and I/O disk required storage space.

Using the appropriate Writable type in your code will contribute...