Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Adding a new DataNode


This recipe shows you how to add new nodes to an existing HDFS cluster without restarting the whole cluster, and how to force HDFS to rebalance after the addition of new nodes. Commercial Hadoop distributions typically provide a GUI-based approach to add and remove DataNodes.

Getting ready

  1. Install Hadoop on the new node and replicate the configuration files of your existing Hadoop cluster. You can use rsync to copy the Hadoop configuration from another node; for example:

    $ rsync -a <master_node_ip>:$HADOOP_HOME/etc/hadoop/ $HADOOP_HOME/etc/hadoop
    
  2. Ensure that the master node of your Hadoop/HDFS cluster can perform password-less SSH to the new node. Password-less SSH setup is optional if you are not planning to use the bin/*.sh scripts from the master node to start/stop the cluster.

How to do it...

The following steps will show you how to add a new DataNode to an existing HDFS cluster:

  1. Add the IP or the DNS of the new node to the $HADOOP_HOME/etc/hadoop/slaves file in...