High Availability

HA configuration errors on virtualised ESX ESXi

0

If you are running a copy of workstation for your vsphere lab and you are encountering an error when trying to enable HA/DRS clusters then this is probably the issue.

When you create a ESX vm, vmware workstation autocreates a vm with 2GB’s of ram which is the lowest amount you can get away with to boot the hypervisor. However in order for HA to work you will need a minimum of 2300megs of ram (2.3GB) so shutdown your ESX/i vm and up the ram by 300megs or more and you will find that HA/DRS works correctly!

VMware High Availability

0

High availability or HA as I will call it from now on, is a feature of Virtual Center which allows for the automatic restart of VM’s in the event of a host failure.

For example if you had 4 ESX servers running 40 VM’s (10 on each). if one host goes pop then HA would detect the failure and restart the VM’s on the 3 remaining hosts. However of course whilst there is not a great deal of options to fiddle about with (most of them follow the same pattern)  you do have an important decision to make, which is if a host fails do you want to restart your VM’s or would you rather they stay down. This is basically is it more important that all the vm’s are up and running but possibly slower than normal, or would you rather some or all of the VM’s stay down until you have dragged yourself out of bed and into the office to fix the issue.

VMware HA Tab

HA can be enabled once you have created a cluster by right clicking on the cluster and selecting “Edit Settings”. The first screen you will see consists of 2 check boxes, one for enabling/disabling HA and one for enabling/disabling  DRS. The choice here is self-explanatory but you might want to spend a minute reading the couple of paragraphs on that page.

The next tab worth looking at is the Vmware HA tab there are 3-4 options here that you will need to consider.

The first option is Admission Control, within that setting is the options to set the number of host failures the cluster can tolerate this can be any number between 1-4. This by default is set to 2 and of course if you suddenly find yourself loosing 4 hosts in your cluster then you have a rather large problem on your hands. The next option is to prevent or allow the powering on of vm’s if they violate availability constraints. This means basically do you want to allow VM to be powered on even if the total number of configured memory resources exceeds the actual resources that the cluster provides.

Maths bit: You can work out your availability constraints by taking the amount of ram provided by your smallest ESX host (I.E, the one with the least amount of physical memory) and then find your vm with the most amount of configured memory and divide the ESX memory by the vm ram which will give you your figure of the amount of guest vm’s each host can have, any more than that and your availability constraints have been violated!

Example:

6 ESX Hosts smallest has 24 GB of ram largest amount of guest ram is 2GB and host failure is set to 1.

24/2×5=60
So if one host fails the total amount of virtual machines that can be powered on with violating availability constraints is 60. If you need any more than that then you will need to allow the vm’s to power on even though they violate the constraints.
The next setting is the default cluster setting, within the settings is the vm restart priority which is by default set to medium This setting is cluster wide but can be overridden at individual vm level. The configurable options for the restart policy are high, medium low and disabled. Clusters set to high have their vm’s restarted first, medium next and low last and disabled.. well you get the idea.
The next option down is the host isolation response setting, I’ve blogged about this before so I wont go into too much detail but it basically means when a host loses its network connection to other hosts in the cluster. Network connectivity is checked regularly by a heartbeat (ping). It is possible in vmware to have a situation where the host has lost its network connection to the other hosts in the cluster but the vm’s on that server are perfectly happy and working away normally. In previous updates of esx the default behaviour was to power off the vm which would leave HA to restart the vm’s on another server, However from U2 onwards this changed to leave vm powered on because you will find 9 times out of 10 just the service console is having issues and the vm’s are still working away quite happily.
The next option is “virtual machine monitoring” this again is an experimental feature which allows virtualcenter to monitor the tools installed on the vm’s (by using heartbeats in a very similar way to HA). You an enable this option and use the slider to adjust the sensitivity. Virtual machine monitoring does know when a vm has been purposefully shutdown or powered off so it will not be constantly restarting machines you are trying to shutdown.
The only button on this page is the Advanced Options which you should only make changes to this once you know what you are doing.
The sub tab of the HA options is “Virtual Machine Options” this allows you too specify different restart priority and host isolation response behaviours for individual vm’s. So you can effectively prioritize/delay or disable the restarting of individual vm’s should you deem appropriate.
Go to Top