Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Summary


This chapter has given a whistle-stop tour through storage on a Hadoop cluster. In particular, we covered:

  • The high-level architecture of HDFS, the main filesystem used in Hadoop

  • How HDFS works under the covers and, in particular, its approach to reliability

  • How Hadoop 2 has added significantly to HDFS, particularly in the form of NameNode HA and filesystem snapshots

  • What ZooKeeper is and how it is used by Hadoop to enable features such as NameNode automatic failover

  • An overview of the command-line tools used to access HDFS

  • The API for filesystems in Hadoop and how at a code level HDFS is just one implementation of a more flexible filesystem abstraction

  • How data can be serialized onto a Hadoop filesystem and some of the support provided in the core classes

  • The various file formats available in which data is most frequently stored in Hadoop and some of their particular use cases

In the next chapter, we will look in detail at how Hadoop provides processing frameworks that can be used to process...