Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Operations in the Hadoop 2 world


As mentioned in Chapter 2, Storage, some of the most significant changes made to HDFS in Hadoop 2 involve its fault tolerance and better integration with external systems. This is not just a curiosity, but the NameNode High Availability features, in particular, have made a massive difference in the management of clusters since Hadoop 1. In the bad old days of 2012 or so, a significant part of the operational preparedness of a Hadoop cluster was built around mitigations for, and restoration processes around failure of the NameNode. If the NameNode died in Hadoop 1, and you didn't have a backup of the HDFS fsimage metadata file, then you basically lost access to all your data. If the metadata was permanently lost, then so was the data.

Hadoop 2 has added the in-built NameNode HA and the machinery to make it work. In addition, there are components such as the NFS gateway into HDFS, which make it a much more flexible system. But this additional capability does...