Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Summary


HDFS is a great filesystem for MapReduce workloads. But its sequential access pattern and non-compliance with POSIX interfaces make it tedious to work with in certain situations. Hadoop allows its users to extend HDFS or provide drop-in replacements. The key takeaways from this chapter are as follows:

  • There are a number of implementations that extend or provide drop-in replacements for HDFS. CephFS, MapRFS, GPFS from IBM, and Cassandra by DataStax are some examples of such extensions.

  • Interface to the Amazon S3 storage service is available out of the box in Hadoop. Both a native-storage S3 filesystem interface and a block-storage filesystem interface are available.

  • Extending Hadoop to incorporate other filesystems is done by extending the FileSystem abstract base class. The FSDataInputStream and FSDataOutputStream objects are used to wrap the input and output streams of the underlying filesystem respectively.

  • The security and access control mechanisms of the underlying filesystem can...