Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
About the Author
About the Reviewers

Implementing a filesystem in Hadoop

Based on the situation, it might be a necessity to replace HDFS with a filesystem of your choice. Hadoop provides out-of-the-box support for a few filesystems such as S3. HDFS replacement can be done either as a drop-in replacement or, as in the case with S3, seamless integration with the S3 file store for input and output.

In this section, we will re-implement the S3 native filesystem and extend Hadoop. The code in this section illustrates the steps on how HDFS replacement can be done. Error handling and other features related to S3 have been omitted for brevity.

The major steps in implementing a filesystem for Hadoop are as follows:

  1. The org.apache.hadoop.fs.FileSystem abstract class needs to be extended and all the abstract methods need to be overridden. There are out-of-the-box implementations for FilterFileSystem, NativeS3FileSystem, S3FileSystem, RawLocalFileSystem, FTPFileSystem, and ViewFileSystem.

  2. The open method returns an FsDataInputStream object...