Before reading this post, it is recommended to read this article, which will give you an idea on what HA is.
Types of host failures:
Three types of host failure are detected:
- Host failing to function
- Host becoming isolated from network
- Host losing network connectivity with the master host
The liveliness of the slave hosts are identified by a communication between the slave and the master host. This communication includes the exchange of network heartbeats (Network Heartbeating) every second. This is considered to be the primary validation of master host, ensuring the availability of the slave hosts. If this communication fails, the master host checks for heartbeat exchange between the slave host and any of its datastores. The master host also validates the availability of the slave host by sending ICMP packets to its management IP address.
If a host is found to have failed, all the VM’s are restarted on alternate hosts by the master host. A host is considered to have failed if all the below three incidents occurred:
- When a master host is unable to communicate with the slave host
- When the slave host does not respond to ICMP pings
- When the slave host does not exchange heartbeats with its datastores
If a host is found to have communication between its datastores, then the master host assumes that the slave host is in a network isolation state and continues to monitor the host and virtual machines.
DataStore heartbeating is a technique used to validate the state of hosts, when the primary host is in no way to contact the slave host over the management network. If a slave host has stopped datastore heartbeating, then it is considered to have failed and the VM’s are restarted on other hosts by the master host. If a slave host is found to have datastore heartbeating, but can’t communicate over the management network, then it is considered to have in a network partition or network isolated state.
vCenter server usually selects some datastores for heartbeating, which anyway can be overridden by the administrators. But then the datastores are selected based on the number of hosts, this datastore can access. Only datastores mounted by atleast two hosts are available for heartbeating.