Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

Implementing an S3 native filesystem in Hadoop


Let's first create InputStream and OutputStream for the filesystem. In our example, we have to connect to the AWS to read and write files to S3.

Hadoop provides us with the FSInputStream class to cater to custom filesystems. We extend this class and override a few methods in the example implementation. A lot of private variables are declared along with the constructor and helper methods to initialize the client as illustrated in the following code snippet. The private variables contain objects that are used to configure and retrieve data from the filesystem. In this example, we use objects such as AmazonS3Client to call REST web APIs on AWS, S3Object as a representation of the remote object on S3, and S3ObjectInputStream representing the object stream to perform the read operation. All the AWS-related classes are present in the com.amazonaws.services.s3 and com.amazonaws.services.s3.model packages. There are a few other private variables such...