Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Managing and serializing data


Having a filesystem is all well and good, but we also need mechanisms to represent data and store it on the filesystems. We will explore some of these mechanisms now.

The Writable interface

It is useful, to us as developers, if we can manipulate higher-level data types and have Hadoop look after the processes required to serialize them into bytes to write to a file system and reconstruct from a stream of bytes when it is read from the file system.

The org.apache.hadoop.io package contains the Writable interface, which provides this mechanism and is specified as follows:

   public interface Writable
   {
   void write(DataOutput out) throws IOException ;
   void readFields(DataInput in) throws IOException ;
   }

The main purpose of this interface is to provide mechanisms for the serialization and deserialization of data as it is passed across the network or read and written from the disk.

When we explore processing frameworks on Hadoop in later chapters, we will often...