After the overview of Hadoop in the previous chapter, we will now start looking at its various component parts in more detail. We will start at the conceptual bottom of the stack in this chapter: the means and mechanisms for storing data within Hadoop. In particular, we will discuss the following topics:
Describe the architecture of the Hadoop Distributed File System (HDFS)
Show what enhancements to HDFS have been made in Hadoop 2
Explore how to access HDFS using command-line tools and the Java API
Give a brief description of ZooKeeper—another (sort of) filesystem within Hadoop
Survey considerations for storing data in Hadoop and the available file formats
In Chapter 3, Processing – MapReduce and Beyond, we will describe how Hadoop provides the framework to allow data to be processed.