Hadoop uses custom datatype serialization/RPC mechanism and defines its own box type classes. These classes are used to manipulate strings (Text
), integers (IntWritable
), and so on, and they implement the Writable
class, which defines a deserialization protocol.
Therefore, all values
in Hadoop are Writable
type objects and all keys
are instances of WritableComparable
, which defines a sort order, because they need to be compared.
Writable objects are mutable and considerably more compact as no meta info needs to be stored (class name, fields, super classes, and so on), and straightforward random access gives higher performance. As binary Writable
types will take up less space, this will reduce the size of intermediate data written by the Map or Combiner function. Reducing intermediate data can provide a substantial performance gain by reducing network transfer and I/O disk required storage space.
Using the appropriate Writable type in your code will contribute...