Hadoop Backup and Recovery Solutions

Hadoop Backup and Recovery Solutions

By : Gaurav Barot, Chintan Mehta, Amij Patel

Buy this Book

Hadoop Backup and Recovery Solutions

By: Gaurav Barot, Chintan Mehta, Amij Patel

Buy this Book

Overview of this book

Hadoop offers distributed processing of large datasets across clusters and is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. It enables computing solutions that are scalable, cost-effective, flexible, and fault tolerant to back up very large data sets from hardware failures. Starting off with the basics of Hadoop administration, this book becomes increasingly exciting with the best strategies of backing up distributed storage databases. You will gradually learn about the backup and recovery principles, discover the common failure points in Hadoop, and facts about backing up Hive metadata. A deep dive into the interesting world of Apache HBase will show you different ways of backing up data and will compare them. Going forward, you'll learn the methods of defining recovery strategies for various causes of failures, failover recoveries, corruption, working drives, and metadata. Also covered are the concepts of Hadoop matrix and MapReduce. Finally, you'll explore troubleshooting strategies and techniques to resolve failures.

Hadoop Backup and Recovery Solutions

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Knowing Hadoop and Clustering Basics

Understanding the need for Hadoop

Understanding HDFS design

Understanding the basics of Hadoop cluster

Summary

Understanding Hadoop Backup and Recovery Needs

Understanding the backup and recovery philosophies

Knowing the necessity of backing up Hadoop

Determining backup areas – what should I back up?

Is taking backup enough?

Summary

Determining Backup Strategies

Knowing the areas to be protected

Understanding the common failure types

Learning a way to define the backup strategy

Understanding the need for backing up Hive metadata

Summary

Backing Up Hadoop

Data backup in Hadoop

HBase

Approaches to backing up HBase

Summary

Determining Recovery Strategy

Knowing the key considerations of recovery strategy

Disaster failure at data centers

Restoring a point-in time copy for auditing

Restoring a data copy due to user error or accidental deletion

Defining recovery strategy

Summary

Recovering Hadoop Data

Failover to backup cluster

Importing a table or restoring a snapshot

Pointing the HBase root folder to the backup location

Locating and repairing corruptions

Recovering a drive from the working state

Lost files

The recovery of NameNode

Summary

Monitoring

Monitoring overview

Metrics of Hadoop

Monitoring node health

Cluster monitoring

Logging

Summary

Troubleshooting

Understanding troubleshooting approaches

Understanding common failure points

Identifying the root cause

Knowing issue resolution techniques

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Importing a table or restoring a snapshot

The corruption of the file system might be because of multiple reasons, such as software upgrades corrupting the filesystem, human errors or, bugs in the application. With the help of snapshots in HDFS, we can reduce the probable damage to the data in the system during such scenarios.

The snapshot mechanism helps to preserve the current state of the filesystem and enables administrators to roll back the namespace and storage states in the working condition.

HDFS can have only one existence of a snapshot with an optional configuration with the administrator to enable it during startup. If a snapshot is triggered, NameNode refers to the checkpoint and the journal file and merges them in the memory. It would now write a new checkpoint and an empty journal on to a new location, so the old checkpoint and journal remain unaffected.

During the handshake, NameNode pushes DataNodes to check whether a snapshot is to be created or not. A local snapshot in DataNode...

Hadoop Backup and Recovery Solutions

By : Gaurav Barot, Chintan Mehta, Amij Patel

Hadoop Backup and Recovery Solutions

By: Gaurav Barot, Chintan Mehta, Amij Patel

Overview of this book

Related Content you might be interested in

Current Title:

Hadoop Backup and Recovery Solutions

Importing a table or restoring a snapshot