Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

Summary


HDFS is a great filesystem for MapReduce workloads. But its sequential access pattern and non-compliance with POSIX interfaces make it tedious to work with in certain situations. Hadoop allows its users to extend HDFS or provide drop-in replacements. The key takeaways from this chapter are as follows:

  • There are a number of implementations that extend or provide drop-in replacements for HDFS. CephFS, MapRFS, GPFS from IBM, and Cassandra by DataStax are some examples of such extensions.

  • Interface to the Amazon S3 storage service is available out of the box in Hadoop. Both a native-storage S3 filesystem interface and a block-storage filesystem interface are available.

  • Extending Hadoop to incorporate other filesystems is done by extending the FileSystem abstract base class. The FSDataInputStream and FSDataOutputStream objects are used to wrap the input and output streams of the underlying filesystem respectively.

  • The security and access control mechanisms of the underlying filesystem can...