Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
About the Author
About the Reviewers


Big data processing involves data representation either in storage or in transit over the network. Compact representation, fast transformations, extensibility, and backward compatibility of the data representation are desired properties. Some key takeaways from this chapter related to data representation are as follows:

  • Hadoop provides inbuilt serialization/deserialization mechanisms using the Writable interface. The Writable classes are serialized more compactly than Java serialization.

  • Avro is a flexible and extensible data serialization framework. It serializes data in binary and is supported by Hadoop, MapReduce, Pig, and Hive.

  • Avro provides dynamic typing, eliminating the need for code generation. The schema can be stored with the data and read by any subsystem.

  • Compression techniques trade speed and storage savings. Hadoop supports many compression codecs along this tradeoff spectrum. Compression is a very important optimization parameter for big data processing.

  • Hadoop supports...