Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Summary


Big data processing involves data representation either in storage or in transit over the network. Compact representation, fast transformations, extensibility, and backward compatibility of the data representation are desired properties. Some key takeaways from this chapter related to data representation are as follows:

  • Hadoop provides inbuilt serialization/deserialization mechanisms using the Writable interface. The Writable classes are serialized more compactly than Java serialization.

  • Avro is a flexible and extensible data serialization framework. It serializes data in binary and is supported by Hadoop, MapReduce, Pig, and Hive.

  • Avro provides dynamic typing, eliminating the need for code generation. The schema can be stored with the data and read by any subsystem.

  • Compression techniques trade speed and storage savings. Hadoop supports many compression codecs along this tradeoff spectrum. Compression is a very important optimization parameter for big data processing.

  • Hadoop supports...