Host isolation is basically a state that a host in an HA cluster can reside in should it detect a loss of network connection. However there are a good few things to know about this feature. When a host is in a HA cluster it sends out a heartbeat to other hosts in the same HA cluster. This enables the cluster to detect when a host has failed and power on the VM’s that it was running should that be the setting you choose.

Firstly the Automated Availability Manager (AAM) controls the heartbeat process through the Service Console and its configured address.

However this alone presents itself with a problem. If you only have one Service Console installed on a host it is possible that AAM will think a host has failed when in actual fact a failure has only occurred somewhere on the path from the Service Console mapping to the physical NIC (uplink) and outwards. With a host that has multiple uplinks (a full blown ESX Production server will need at least 3 or more) It could mean that the Virtual Machine port group on another physical NIC could be happily working away.

After 15 seconds of missing a heartbeat response each node will ping the default gateway for its service console (this is called the Isolation Response Address). This is basically the hosts way of saying “is this problem my fault”. If the host receives a response from its default gateway it carries on as normal as the fault is not with the host. If the host however does not receive a response it will go into isolation mode.

The host(s) that enters isolation mode would then read their isolation configuration which will tell the host either to power off the VM’s or leave them running.

The correct configuration for this setting relies pretty much on you network. If for example you have a vswitch configured with a Virtual Machine port group and the Service Console port group then as you’ve lost your Service Console you’ve also lost your VM network too so you will want to power the VM’s off and let HA bring them all back up again on other hosts. However if your network is redundant enough loosing a service console isnt really a big thing if your Virtual Machine and Vmotion networks are still working.

It is considered best practice to have 2 Service Console’s configured per host on different vswitches,vmnic’s and physical networks. this sounds a bit complicated but when you consider that Vmotion should really have its own dedicated physical switch and subnet then the choice on where to put the Service Console suddenly becomes very clear.

In the Next blog I will talk about the actual steps you need to take to configure Host Isolation and a few extra switches that you may need.