The health of the filesystem is very important for data retrieval and optimal performance. In a distributed system, it becomes more critical to maintain the good health of the HDFS filesystem so as to ensure block replication and near-parallel streaming of data blocks.
In this recipe, we will see how to check the health of the filesystem and do repairs, if any are needed.
Make sure you have a running cluster that has already been up for a few days with data. We can run the commands on a new cluster as well, but for the sake of this lab, it will give you more insights if it is run on a cluster with a large dataset.
ssh to the
master1.cyrus.com
Namenode and change the user tohadoop
.To check the HDFS root filesystem, execute the
hdfs fsck /
command, as shown in the following screenshot:We can also check the status of just one file instead of the entire filesystem, as shown in the following screenshot: