Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

HDFS high availability


NameNodes are the heart of an HDFS Namespace. The availability of any cluster using HDFS is directly related to the availability of the NameNode.

Secondary NameNode, Checkpoint Node, and Backup Node

In Hadoop 1.X, the concept of a Secondary NameNode was introduced. The Secondary NameNode is a shield against disasters. On the failure of a NameNode, the Secondary NameNode can be used to recover the NameNode. The term Secondary NameNode is a misnomer. It is a cold standby and cannot service requests on its own. The NameNode can, however, read from the Secondary NameNode when encountered with failures.

The NameNode writes all HDFS updates to the edits log in the native filesystem. The log is written in an append-only fashion. The NameNode owns another file called the fsimage file that contains the image of HDFS. A NameNode starting up, reads the edits file and applies all the edits one by one to the fsimage file. During this time, no writes are allowed on HDFS. The NameNode...