Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Hadoop filesystems


Until now, we referred to HDFS as the Hadoop filesystem. In reality, Hadoop has a rather abstract notion of filesystem. HDFS is only one of several implementations of the org.apache.hadoop.fs.FileSystem Java abstract class. A list of available filesystems can be found at https://hadoop.apache.org/docs/r2.5.0/api/org/apache/hadoop/fs/FileSystem.html. The following table summarizes some of these filesystems, along with the corresponding URI scheme and Java implementation class.

Filesystem

URI scheme

Java implementation

Local

file

org.apache.hadoop.fs.LocalFileSystem

HDFS

hdfs

org.apache.hadoop.hdfs.DistributedFileSystem

S3 (native)

s3n

org.apache.hadoop.fs.s3native.NativeS3FileSystem

S3 (block-based)

s3

org.apache.hadoop.fs.s3.S3FileSystem

There exist two implementations of the S3 filesystem. Native—s3n—is used to read and write regular files. Data stored using s3n can be accessed by any tool and conversely can be used to read data generated...