A Hadoop cluster needs to be upgraded when new versions with bug fixes or new features are released. In this recipe, we will outline steps to upgrade a Hadoop cluster to a newer version.
Download the desired Hadoop release from an Apache mirror site: http://www.apache.org/dyn/closer.cgi/hadoop/common/. In this book, we assume to upgrade Hadoop from version 1.1.2 to version 1.2.0, which is still in the beta state when writing this book.
We assume that there are no running or pending MapReduce jobs in the cluster.
Tip
In the process of upgrading a Hadoop cluster, we want to minimize the damage to the data stored on HDFS, and this procedure is the cause of most of the upgrade problems. The data damages can be caused by either human operation or software and hardware failures. So, a backup of the data might be necessary. But the sheer size of the data on HDFS can be a headache for most of the upgrade experience.
A more practical way is to only back up the HDFS filesystem...