Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 9. HDFS Replacements

The parallelism and scalability of the MapReduce computing paradigm are greatly influenced by the underlying filesystem. HDFS is the default filesystem that comes with most Hadoop distributions. The filesystem automatically chunks files into blocks and stores them in a replicated fashion across the cluster. The information of the distribution pattern is supplied to the MapReduce engine that can then smartly place tasks so that movement of data over the network is minimized.

However, there are many use cases where HDFS may not be ideal. In this chapter, we will look at the following topics:

  • The strengths and drawbacks of HDFS when compared to other POSIX filesystems.

  • Hadoop's support for other filesystems. One of them is Amazon's cloud storage service known as Simple Storage Service (S3). Reading and writing files from and to the S3 services is permitted within Hadoop.

  • Hadoop HDFS has extensibility features. Extending the framework can be of two kinds: by providing...