Having a filesystem is all well and good, but we also need mechanisms to represent data and store it on the filesystems. We will explore some of these mechanisms now.
It is useful, to us as developers, if we can manipulate higher-level data types and have Hadoop look after the processes required to serialize them into bytes to write to a file system and reconstruct from a stream of bytes when it is read from the file system.
The org.apache.hadoop.io package
contains the Writable interface, which provides this mechanism and is specified as follows:
public interface Writable { void write(DataOutput out) throws IOException ; void readFields(DataInput in) throws IOException ; }
The main purpose of this interface is to provide mechanisms for the serialization and deserialization of data as it is passed across the network or read and written from the disk.
When we explore processing frameworks on Hadoop in later chapters, we will often...