Being a Ceph storage admin, you will need to manage Ceph clusters with multiple physical disks. As the physical disk count increases for your Ceph cluster, the frequency of disk failures might also increase. Hence, replacing a failed disk drive might become a repetitive task for a Ceph storage administrator. There is generally no need to worry if one or more disks fail in your Ceph cluster as Ceph will take care of the data by its replication and high availability feature. The process of removing OSDs from a Ceph cluster relies on Ceph's data replication and removing all the entries of failed OSDs from CRUSH cluster maps. We will now see the failed disk replacement process on ceph-node1 and osd.0.
Firstly, check the status of your Ceph cluster. Since this cluster does not have any failed disk, the status will be HEALTH_OK
:
# ceph status
Since we are demonstrating this exercise on virtual machines, we need to forcefully fail a disk by bringing ceph-node1 down...