Book Image

Hadoop Backup and Recovery Solutions

By : Gaurav Barot, Chintan Mehta, Amij Patel

Book Image

Hadoop Backup and Recovery Solutions

By: Gaurav Barot, Chintan Mehta, Amij Patel

Overview of this book

<p>Hadoop offers distributed processing of large datasets across clusters and is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. It enables computing solutions that are scalable, cost-effective, flexible, and fault tolerant to back up very large data sets from hardware failures.</p> <p>Starting off with the basics of Hadoop administration, this book becomes increasingly exciting with the best strategies of backing up distributed storage databases.</p> <p>You will gradually learn about the backup and recovery principles, discover the common failure points in Hadoop, and facts about backing up Hive metadata. A deep dive into the interesting world of Apache HBase will show you different ways of backing up data and will compare them. Going forward, you'll learn the methods of defining recovery strategies for various causes of failures, failover recoveries, corruption, working drives, and metadata. Also covered are the concepts of Hadoop matrix and MapReduce. Finally, you'll explore troubleshooting strategies and techniques to resolve failures.</p>

Hadoop Backup and Recovery Solutions

Hadoop Backup and Recovery Solutions

Credits

About the Authors

About the Authors

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Knowing Hadoop and Clustering Basics

Knowing Hadoop and Clustering Basics

Understanding the need for Hadoop

Understanding HDFS design

Understanding the basics of Hadoop cluster

Understanding Hadoop Backup and Recovery Needs

Understanding Hadoop Backup and Recovery Needs

Understanding the backup and recovery philosophies

Knowing the necessity of backing up Hadoop

Determining backup areas – what should I back up?

Is taking backup enough?

Determining Backup Strategies

Determining Backup Strategies

Knowing the areas to be protected

Understanding the common failure types

Learning a way to define the backup strategy

Understanding the need for backing up Hive metadata

Backing Up Hadoop

Backing Up Hadoop

Data backup in Hadoop

Approaches to backing up HBase

Determining Recovery Strategy

Determining Recovery Strategy

Knowing the key considerations of recovery strategy

Disaster failure at data centers

Restoring a point-in time copy for auditing

Restoring a data copy due to user error or accidental deletion

Defining recovery strategy

Recovering Hadoop Data

Recovering Hadoop Data

Failover to backup cluster

Importing a table or restoring a snapshot

Pointing the HBase root folder to the backup location

Locating and repairing corruptions

Recovering a drive from the working state

The recovery of NameNode

Monitoring

Monitoring overview

Metrics of Hadoop

Monitoring node health

Cluster monitoring

Troubleshooting

Troubleshooting

Understanding troubleshooting approaches

Understanding common failure points

Identifying the root cause

Knowing issue resolution techniques

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Index

A

Apache HBase
- about / Apache HBase
Apache HCatalog
- about / Apache HCatalog
Apache Hive
- about / Apache Hive
- URL / Apache Hive
Apache Lucene
- about / Understanding the need for Hadoop
Apache Pig
- about / Apache Pig
applications
- about / Applications, Configurations
automatic failover configuration
- about / Automatic failover configuration
- working / How automatic failover configuration works
- performing / How to configure automatic failover
- troubleshooting / How to configure automatic failover
- transitionToActive command / The transitionToActive and transitionToStandBy commands
- transitionToStandBy command / The transitionToActive and transitionToStandBy commands
- failover, between two NameNodes / Failover
- getServiceState command / The getServiceState command
- checkHealth command / The checkHealth command

B

backup
- about / Understanding the backup and recovery philosophies
- versus recovery / Knowing the key considerations of recovery strategy
backup approaches, HBase
- about / Approaches to backing up HBase
- snapshots / Snapshots
- replication / HBase replication
- Export utility / Export
- copy table functionality / The copy table
- HTable API / HTable API
- offline backup / Offline backup
- backup options, comparing / Comparing backup options
backup areas
- determining / Determining backup areas – what should I back up?
- datasets / Datasets, Configurations
- applications / Applications, Configurations
- configurations / Configurations
backup cluster
- failover / Failover to backup cluster
- installation / Installation and configuration
- configuration / Installation and configuration
- Hadoop installation, testing / The test installation of Hadoop
- automatic failover. Hadoop configuration / Hadoop configuration for an automatic failover
backup philosophy
- about / The backup philosophy
- changes / Changes since the last backup
- rate of new data arrival / The rate of new data arrival
- cluster size / The size of the cluster
- priority of datasets / Priority of the datasets
- datasets, selecting / Selecting the datasets or parts of datasets
- timeliness of data backups / The timelines of data backups
- window of possible data loss, reducing / Reducing the window of possible data loss
- backup consistency / Backup consistency
- invalid backups, avoiding / Avoiding invalid backups
backup strategy
- defining / Learning a way to define the backup strategy
- need for / Why do I need a strategy?
- considerations / What should be considered in a strategy?
Bigtable
- URL / HBase history

C

child nodes, ZooKeeper
- Peers znode / HBase replication
- RS znode / HBase replication
cluster monitoring
- about / Cluster monitoring
common failure points
- about / Understanding common failure points
- human errors / Human errors
- configuration issues / Configuration issues
- hardware failures / Hardware failures
- resource allocation issues / Resource allocation issues
common failure types
- about / Understanding the common failure types
- hardware failure / Hardware failure
- user application failure / User application failure
components, Hadoop
- HDFS / Understanding the basics of Hadoop cluster
- MapReduce / Understanding the basics of Hadoop cluster
components, Hadoop cluster / Understanding the basics of Hadoop cluster
components, HBase data model
- tables / Understanding the HBase data model
- rows / Understanding the HBase data model
- column family / Understanding the HBase data model
- column qualifier / Understanding the HBase data model
- cell / Understanding the HBase data model
- timestamp / Understanding the HBase data model
components, metadata file
- configure capacity / The number of under-replicated blocks
- DFS used / The number of under-replicated blocks
- non DFS used / The number of under-replicated blocks
- DFS used percent / The number of under-replicated blocks
- DFS remaining percent / The number of under-replicated blocks
- live nodes / The number of under-replicated blocks
- dead nodes / The number of under-replicated blocks
- decommissioning nodes / The number of under-replicated blocks
- number of under replicated blocks / The number of under-replicated blocks
components, required for automatic failover configuration
- Apache ZooKeeper / Automatic failover configuration
- ZooKeeper Failover Controller / Automatic failover configuration
CompositeContext
- about / CompositeContext
configuration properties, mapred-site.xml
- mapred.map.max.attempts / Hadoop's handling of failing tasks
- mapred.reduce.max.attempts / Hadoop's handling of failing tasks
- mapred.max.tracker.failures / Hadoop's handling of failing tasks
configurations
- about / Configurations
considerations, backup strategy
- about / What should be considered in a strategy?
- filesystem check (fsck) / Filesystem check (fsck)
- filesystem balancer / Filesystem balancer
- Hadoop cluster, upgrading / Upgrading your Hadoop cluster
- network layout, designing / Designing network layout and rack awareness
- rack awareness / Designing network layout and rack awareness
- areas / Most important areas to consider while defining a backup strategy
copying
- versus teeing / Teeing versus copying
copy table functionality, HBase
- about / The copy table
- rs.class / The copy table
- rs.impl / The copy table
- Startrow / The copy table
- stoprow / The copy table
- starttime / The copy table
- endtime / The copy table
- versions / The copy table
- new.name / The copy table
- peer.adr / The copy table
- Tablename / The copy table
corruptions
- locating / Locating and repairing corruptions
- repairing / Locating and repairing corruptions

D

daily process/mechanisms
- about / Knowing the key considerations of recovery strategy
data
- writing, into HDFS cluster / Scenario 1 – writing data to the HDFS cluster
- reading, from HDFS cluster / Scenario 2 – reading data from the HDFS cluster
- replicating, DistCp used / Replication of data using DistCp
data backup, Hadoop
- about / Data backup in Hadoop
- considerations / Data backup in Hadoop
- approaches / Data backup in Hadoop
- distributed copy (DistCp) tool / Distributed copy
- architectural approach / Architectural approach to backup
data copy
- restoring, due to user error or accidental deletion / Restoring a data copy due to user error or accidental deletion
DataNode (DN) / What did we do just now?
data operations, HBase
- Get / Accessing HBase data
- Put / Accessing HBase data
- Scan / Accessing HBase data
- Delete / Accessing HBase data
datasets
- about / Datasets, Configurations
- block size / Block size – a large file divided into blocks
- replication factor / Replication factor
- blocks of files, list / A list of all the blocks of a file, The number of under-replicated blocks
- data nodes for block, list / A list of DataNodes for each block – sorted by distance
- ACK packages / The ACK package
- checksums / The checksums
- number, of under replicated blocks / The number of under-replicated blocks
- secondary NameNode / The secondary NameNode
- active and passive nodes, in second generation Hadoop / Active and passive nodes in second generation Hadoop
- hardware failures / Hardware failure
- software failures / Software failure
disaster
- knowing / Knowing a disaster
- recovering from / Knowing a disaster
disaster failure, handling by HBase at data center
- about / How HBase handles failures at data centers
- types of restoring / How HBase handles failures at data centers
disaster failure, handling by HDFS at data center
- about / How HDFS handles failures at data centers
- automatic failover configuration / Automatic failover configuration
disaster failures, handling at data center
- about / Disaster failure at data centers
disaster recovery (DR) / Understanding the common failure types
disaster recovery plan
- about / Knowing a disaster
disaster recovery principle
- about / Understanding the disaster recovery principle, Knowing a disaster
DistCp
- used, for replicating data / Replication of data using DistCp
- used, for updating files / Updating and overwriting using DistCp
- used, for overwriting files / Updating and overwriting using DistCp
distcp command
- options / Distributed copy
distributed copy (DistCp) tool / Data backup in Hadoop, Distributed copy

E

Export utility, HBase / Export

F

Failure occur everywhere / Knowing the areas to be protected
features, HBase / HBase introduction
FileContext
- about / FileContext
files
- updating, DistCp used / Updating and overwriting using DistCp
- overwriting, DistCp used / Updating and overwriting using DistCp
fsck filesystem, parameters
- -files / Managing the HDFS cluster
- -blocks / Managing the HDFS cluster
- -locations / Managing the HDFS cluster
- -racks / Managing the HDFS cluster

G

Ganglia
- URL / GangliaContext
GangliaContext
- about / GangliaContext
Google File System (GFS)
- about / Understanding the need for Hadoop

H

Hadoop
- need for / Understanding the need for Hadoop
- components / Understanding the basics of Hadoop cluster
- backing up, necessity / Knowing the necessity of backing up Hadoop
- backing up, advantages / Knowing the necessity of backing up Hadoop
- backup approaches / Knowing the areas to be protected
- slow-running tasks, handling in / How Hadoop handles slow-running tasks
- skip mode, handling in / Handling skip mode in Hadoop
- data backup / Data backup in Hadoop
- installation / Hadoop installation
- installation, testing / The test installation of Hadoop
Hadoop, versus traditional distributed frameworks
- fundamental differences / Understanding the need for Hadoop
Hadoop cluster
- components / Understanding the basics of Hadoop cluster
Hadoop configuration, automatic failover
- about / Hadoop configuration for an automatic failover
- HA state, preparing in ZooKeeper / Preparing for the HA state in ZooKeeper
- NameNodes, formatting / Formatting and starting NameNodes
- NameNodes, starting / Formatting and starting NameNodes
- ZooKeeper Failover Controller (ZKFC) service, starting / Starting the ZKFC services
- DataNodes, starting / Starting DataNodes
- automatic failover, verifying / Verifying an automatic failover
Hadoop ecosystem
- overview / Apache HCatalog
hardware failure
- about / Hardware failure
- host failure / Host failure
- commodity hardware / Using commodity hardware
- consequences / Hardware failures may lead to loss of data
hardware failures, datasets
- about / Hardware failure
- data corruption on disk / Data corruption on disk
- disk/node failure / Disk/node failure
- rack failure / Rack failure
HBase
- about / HBase
- history / HBase history
- keywords / HBase introduction
- features / HBase introduction
- versus HDFS / HBase introduction
- backup apporaches / Approaches to backing up HBase
- replication modes / Modes of replication
HBase data
- accessing / Accessing HBase data
HBase data model
- about / Understanding the HBase data model
- components / Understanding the HBase data model
HBase replication
- about / HBase replication
HBase root folder
- pointing, to backup location / Pointing the HBase root folder to the backup location
HDFS
- about / Understanding HDFS design, Understanding the basics of Hadoop cluster
- similarities, with traditional file system / Understanding HDFS design
- versus traditional file system / Understanding HDFS design
- goals / Understanding HDFS design
- versus HBase / HBase introduction
HDFS cluster
- data, writing into / Scenario 1 – writing data to the HDFS cluster
- data, reading from / Scenario 2 – reading data from the HDFS cluster
- monitoring / Managing the HDFS cluster
HDFS daemons
- about / Getting familiar with HDFS daemons
- User / Getting familiar with HDFS daemons
- Client / Getting familiar with HDFS daemons
- NameNode / Getting familiar with HDFS daemons
- DataNode / Getting familiar with HDFS daemons
- JobTracker / Understanding the basics of Hadoop cluster
- TaskTracker / Understanding the basics of Hadoop cluster
Hive
- about / What is Hive?
Hive metadata
- need of backing up / Understanding the need for backing up Hive metadata
Hive query language (HQL)
- about / Apache Hive
Hive replication
- about / Hive replication
host failure
- about / Host failure
HTable API
- about / HTable API
- URL / HTable API

I

improper archive
- versus proper archive / Knowing the key considerations of recovery strategy
installation and configuration, backup cluster
- about / Installation and configuration
- user settings / The user and group settings
- group settings / The user and group settings
- Java installation / Java installation
- password-less SSH configuration / Password-less SSH configuration
- ZooKeeper installation / ZooKeeper installation
- Hadoop installation / Hadoop installation
issue resolution techniques
- knowing / Knowing issue resolution techniques
- permanent fix / Knowing issue resolution techniques
- mitigation by configuration / Knowing issue resolution techniques
- mitigation by architecture / Knowing issue resolution techniques
- mitigation by process / Knowing issue resolution techniques

J

Java
- installation / Java installation
Java Management Extension (JMX)
- about / Java Management Extension
JobTracker
- about / Understanding the basics of Hadoop cluster

K

keywords, HBase
- sorted map / HBase introduction
- multidimensional / HBase introduction
- sparse data / HBase introduction
- distributed / HBase introduction
- consistent / HBase introduction

L

logging
- about / Logging
- log output, written via log4j / Log output written via log4j
- log levels, setting / Setting the log levels
- stack traces, obtaining / Getting stack traces
lost files / Lost files

M

managed beans (Mbeans)
- about / Java Management Extension
mapred-site.xml
- configuration properties / Hadoop's handling of failing tasks
MapReduce
- about / Understanding the basics of Hadoop cluster
- map stage / Understanding the basics of Hadoop cluster
- reduce stage / Understanding the basics of Hadoop cluster
MapReduce job / Speculative execution
mean time between failures (MTBF)
- about / Hardware failures may lead to loss of data
metadata file
- components / The number of under-replicated blocks
metrics, of Hadoop
- about / Metrics of Hadoop
- FileContext / FileContext
- GangliaContext / GangliaContext
- NullContextWithUpdateThread / NullContextWithUpdateThread
- CompositeContext / CompositeContext
- Java Management Extension (JMX) / Java Management Extension
monitoring
- overview / Monitoring overview
monitoring tool
- about / Knowing the key considerations of recovery strategy

N

NameNode (NN)
- recovering / The recovery of NameNode, What did we do just now?
- about / The recovery of NameNode
no-recovery mode
- about / The recovery philosophy
node health, monitoring
- about / Monitoring node health
- Hadoop host monitoring / Hadoop host monitoring
- Hadoop process monitoring / Hadoop process monitoring
- HDFS checks / The HDFS checks
- MapReduce checks / The MapReduce checks
NullContextWithUpdateThread
- about / NullContextWithUpdateThread

O

offline backup
- about / Offline backup
offline snapshot / Operations involved in snapshots
online snapshot / Operations involved in snapshots
options, distcp command
- -p [rbugp] / Distributed copy
- -i / Distributed copy
- -log <logdir> / Distributed copy
- -m <num_maps> / Distributed copy
- -overwrite / Distributed copy
- -update / Distributed copy
- -f <urilist_uri> / Distributed copy

P

Peers znode / HBase replication
point-in time copy
- restoring, for auditing / Restoring a point-in time copy for auditing
prompt only once mode
- about / The recovery philosophy
Prompt when found error mode
- about / The recovery philosophy
proper archive
- versus improper archive / Knowing the key considerations of recovery strategy

Q

quality commodity equipment, Hadoop clusters
- CPU / Hardware failures may lead to loss of data
- power / Hardware failures may lead to loss of data
- RAM / Hardware failures may lead to loss of data
- disk / Hardware failures may lead to loss of data
- network / Hardware failures may lead to loss of data

R

rack management policy
- considerations / Rack failure
- advantages / Rack failure
recovery
- about / Understanding the backup and recovery philosophies
- need for / The need for recovery
- versus backup / Knowing the key considerations of recovery strategy
recovery areas
- about / Understanding recovery areas
- core components / Understanding recovery areas
recovery modes, Hadoop
- continue / The recovery philosophy
- stop / The recovery philosophy
- quit / The recovery philosophy
- always / The recovery philosophy
- snapshots / The need for recovery
- replication / The need for recovery
- manual recovery / The need for recovery
- API / The need for recovery
recovery philosophy
- about / The recovery philosophy
Recovery Point Objective (RPO)
- about / Defining recovery strategy
recovery process
- about / Knowing the key considerations of recovery strategy
- disaster failure at data center / Knowing the key considerations of recovery strategy
- point-in-time copy for auditing / Knowing the key considerations of recovery strategy
- accidental deletions / Knowing the key considerations of recovery strategy
recovery strategy
- key considerations / Knowing the key considerations of recovery strategy
- defining / Defining recovery strategy
- centralized configuration / Centralized configuration
- monitoring / Monitoring
- alerting / Alerting
- teeing, versus copying / Teeing versus copying
Recovery time objective (RTO)
- about / Defining recovery strategy
replication modes, HBase
- about / Modes of replication
- master-slave / Modes of replication
- master-master / Modes of replication
- cyclic / Modes of replication
root cause, for failure
- identifying / Identifying the root cause
- environment / Identifying the root cause
- pattern / Identifying the root cause
- logs / Identifying the root cause
- resources / Identifying the root cause
RS znode / HBase replication

S

safe-recovery mode
- about / The recovery philosophy
secondary NameNode
- about / The secondary NameNode
- disk, fixing / Fixing the disk that has been corrupted or repairing it
- edit log, recovering / Recovering the edit log
- state, recovering from / Recovering the state from the secondary NameNode
skip mode
- about / Hadoop's skip mode
- handling, in Hadoop / Handling skip mode in Hadoop
slow running tasks
- failures / Failure of slow-running tasks
- handling, in Hadoop / How Hadoop handles slow-running tasks
snapshot
- restoring / Importing a table or restoring a snapshot
snapshots
- about / Snapshots
- operations / Operations involved in snapshots, Snapshot operation commands
- offline snapshot / Operations involved in snapshots
- online snapshot / Operations involved in snapshots
software failures, datasets
- about / Software failure
- accidental, or malicious data deletions / Software failure
- permissions facility / Software failure
state
- recovering, from secondary NameNode / Recovering the state from the secondary NameNode

T

table
- importing / Importing a table or restoring a snapshot
task failure, due to data
- about / Task failure due to data
- data loss, or corruption / Data loss or corruption
- no live node contains block error / No live node contains block errors
TaskTracker
- about / Understanding the basics of Hadoop cluster
teeing
- versus copying / Teeing versus copying
traditional distributed frameworks, versus Hadoop
- fundamental differences / Understanding the need for Hadoop
traditional file system
- similarities, with HDFS / Understanding HDFS design
- versus HDFS / Understanding HDFS design
troubleshooting approaches
- about / Understanding troubleshooting approaches

U

user application failure
- about / User application failure
- software causing task failure / Software causing task failure
- failure, of slow running tasks / Failure of slow-running tasks
- failing tasks, handling / Hadoop's handling of failing tasks
- task failure, due to data / Task failure due to data
- bad data handling / Bad data handling – through code
- skip mode / Hadoop's skip mode

W

working state
- working drive, recovering / Recovering a drive from the working state
Write Ahead Log (WAL) / HBase replication

Z

ZKFC
- health monitoring / How automatic failover configuration works
- ZooKeeper session administration / How automatic failover configuration works
- ZooKeeper-based ballot / How automatic failover configuration works
ZooKeeper
- URL, for download / ZooKeeper installation
- installation / ZooKeeper installation
- HA state, preparing / Preparing for the HA state in ZooKeeper
ZooKeeper (ZK)
- about / Automatic failover configuration
- failure detector / Automatic failover configuration
- active node locator / Automatic failover configuration
- mutual exclusion of active state / Automatic failover configuration
ZooKeeper Failover Controller (ZKFC) / Starting the ZKFC services