This recipe shows you how to add new nodes to an existing HDFS cluster without restarting the whole cluster, and how to force HDFS to rebalance after the addition of new nodes. Commercial Hadoop distributions typically provide a GUI-based approach to add and remove DataNodes.
Install Hadoop on the new node and replicate the configuration files of your existing Hadoop cluster. You can use
rsync
to copy the Hadoop configuration from another node; for example:$ rsync -a <master_node_ip>:$HADOOP_HOME/etc/hadoop/ $HADOOP_HOME/etc/hadoop
Ensure that the master node of your Hadoop/HDFS cluster can perform password-less SSH to the new node. Password-less SSH setup is optional if you are not planning to use the
bin/*.sh
scripts from the master node to start/stop the cluster.