Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Avro serialization


Avro is a popular data serialization framework that is part of Apache Software Foundation. Its key features are as follows:

  • It supports a number of data structures for serialization.

  • It is neutral to particular programming languages and provides fast and compact binary serialization.

  • Code generation is optional in Avro. Data can be read, written, or used in RPCs without having to generate classes or code.

Avro uses schemas during the reading and writing of data. Schemas make the compact representation of the serialized object conducive. The self-describing capability of schemas makes it possible to get rid of object-type metadata to be present along with the serialized byte stream, the method used in Java serialization. The schemas are described in the Javascript Object Notation (JSON) format that has evolved as a popular object description notation on the Web. Schema changes can be handled by having both the old and new schema available when processing data.

The following...