The majority of outages and cases of data loss will be directly caused by the loss of a number of OSDs that exceed the replication level in a short period of time. If these OSDs do not come back online, be it due to a software or hardware failure and Ceph was not able to recover objects in-between OSD failures, then these objects are now lost.
If an OSD has failed due to a failed disk, then it is unlikely that recovery will be possible unless costly disk recovery services are utilized, and there is no guarantee that any recovered data will be in a consistent state. This chapter will not cover recovering from physical disk failures and will simply suggest that the default replication level of 3 should be used to protect you against multiple disk failures.
If an OSD has failed due to a software bug, the outcome is possibly a lot more positive, but the process is complex and time-consuming. Usually, an OSD, which, although the physical disk is in a good...