Book Image

Hadoop Backup and Recovery Solutions

By : Gaurav Barot, Chintan Mehta, Amij Patel
Book Image

Hadoop Backup and Recovery Solutions

By: Gaurav Barot, Chintan Mehta, Amij Patel

Overview of this book

<p>Hadoop offers distributed processing of large datasets across clusters and is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. It enables computing solutions that are scalable, cost-effective, flexible, and fault tolerant to back up very large data sets from hardware failures.</p> <p>Starting off with the basics of Hadoop administration, this book becomes increasingly exciting with the best strategies of backing up distributed storage databases.</p> <p>You will gradually learn about the backup and recovery principles, discover the common failure points in Hadoop, and facts about backing up Hive metadata. A deep dive into the interesting world of Apache HBase will show you different ways of backing up data and will compare them. Going forward, you'll learn the methods of defining recovery strategies for various causes of failures, failover recoveries, corruption, working drives, and metadata. Also covered are the concepts of Hadoop matrix and MapReduce. Finally, you'll explore troubleshooting strategies and techniques to resolve failures.</p>
Table of Contents (15 chapters)
Hadoop Backup and Recovery Solutions
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Knowing the key considerations of recovery strategy


When Hadoop is considered a critical solution, disaster recovery and backup need careful, thorough, and supervised planning. However, setting up a recovery plan can be complicated and confusing. Let's walk through some of the confusions and key considerations to be taken care of. As one rotten apple spoils the other apples, one corrupt copy will eventually spread out in the system and ultimately, decrease the chances of disaster recovery. This is the penultimate principle of disaster recovery.

Note

As long as an error doesn't bound to or corrupt good data, you can secure whatever went amiss in your system.

  • Backup versus recovery: Their purpose may sound similar, but there is a wide difference between them: backup is one of the strategies of protecting data assets to safeguard the data; safeguarding the data is just a part of the disaster recovery process. In our context, we don't have the option of replacement. There is either redundant data...