During the initial VSAN configuration, we went through the process of disabling and then re-enabling vSphere HA on the VSAN cluster. This step was necessary due to changes in how vSphere HA works in VSAN-enabled clusters.
In non-VSAN infrastructures, vSphere HA uses the host management network to determine network isolation. The hosts communicate with each other over the management network and the hosts communicate with the default gateway periodically. If this communication fails, vSphere HA determines that the ESXi host is isolated and will take corrective action.
vSphere HA datastore heart beating adds another layer of communication via the shared datastores but does not change the fundamental network-related assumptions.
Within a VSAN cluster, however, certain HA assumptions must change. As opposed to the typical case where host manageability is the paramount networking consideration, within VSAN the VSAN cluster communication takes precedence. For this reason, vSphere HA had to be modified to cooperate with VSAN. When VSAN is in use, vSphere HA will use the VSAN network for its host-to-host communication.
The reason for this is fairly simple. When VSAN is in use, management network availability has no bearing on whether a VM is accessible and capable of being recovered in the event of a host failure or network partition. If the management network is having problems but the VSAN network is not, it will cause needless failovers. Conversely, if the VSAN network is having a problem that results in the isolation or separate grouping of hosts, vSphere HA must be aware of those changes. If vSphere HA continues to use the management network, it could attempt to power-on VMs on hosts where a VSAN quorum cannot be established, and because of this the HA failover will be unsuccessful.
By moving vSphere HA to the VSAN network, vSphere HA becomes VSAN-aware. If there is a network partition that results in VM object availability on one side of the VSAN partition but not the other (by definition, this will be the case—VSAN object distribution is laid out in such a way as to ensure that the same VM cannot have a quorum in multiple partitions—see Appendix B, Additional VSAN Information), HA will know how to bring up the VM in the applicable partition.
This is fundamental to changing how vSphere HA functions and requires a complete reconfiguration of the vSphere HA cluster and VSAN-specific modifications to vSphere HA logic. It is for this reason that VSAN cannot be enabled on a vSphere HA-enabled cluster. Once VSAN is enabled, vSphere HA can be rebuilt under its new set of operating assumptions.