In Chapter 1, Introduction, we gave a very high-level overview of HDFS; we will now explore it in a little more detail. As mentioned in that chapter, HDFS can be viewed as a filesystem, though one with very specific performance characteristics and semantics. It's implemented with two main server processes: the NameNode and the DataNodes, configured in a master/slave setup. If you view the NameNode as holding all the filesystem metadata and the DataNodes as holding the actual filesystem data (blocks), then this is a good starting point. Every file placed onto HDFS will be split into multiple blocks that might reside on numerous DataNodes, and it's the NameNode that understands how these blocks can be combined to construct the files.
Learning Hadoop 2
Learning Hadoop 2
Overview of this book
Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Introduction
Storage
Processing – MapReduce and Beyond
Real-time Computation with Samza
Iterative Computation with Spark
Data Analysis with Apache Pig
Hadoop and SQL
Data Lifecycle Management
Making Development Easier
Running a Hadoop Cluster
Where to Go Next
Index
Customer Reviews