Tagged: vmware

Vmxnet3 driver crash in ESXi5

Hi all,

I’ve had a couple of occurances of this where a server 2003 R2 vm will seemingly just loose network connectivity. Attempting to ping either in or out will not work. Whilst sometimes powering off and on the vm will resolve the issue it wont stop it randomly happening again (naturally during production hours).

I’ve noted that there is a VMware kb on the issue (which I found after i’d resolved the issue) however it states that it occurs when you have jumbo frames enabled within the guest vm. This however is not the case in my experience.

To resolve this issue just add another vnic to the vm using either the e1000 or vmxnet2 and remove the vmxnet3 vnic the re-enter the vm’s ip information in the newly added card. Another choice would (as a #lonvmug colleague of mine had discovered) be to reinstall vmware tools and reboot the vm Link . Reinstalling the tools is the best bet but I was unable to do that as the vm in question was a terminal services box so I hot added another vnic to avoid rebooting and loosing the disconnected sessions.

 

There will be some cases where you will need to reboot the vm afterwards such as network sensitive app that cannot handle a short amount of network disruption.

 

 

How to virtualize a Domain Controller

First off a domain controller is always a scary thing to p2v but it can actually be a fairly straightforward process to complete.

In our network our DC’s also have other roles and services installed on them which made the easiest and proper way of just creating a new DC within the virtual environment then demoting and removing the old physical one rather difficult. So p2v was really our only option.

If you follow these instructions then you wont go far wrong with p2ving your domain controller.

1/ Investigate your domain controllers services and see if you also have any transactional databases on there as well or anything that may be sensitive to consistancy (such as SQL/Oracle/backup software or AV).

2/ Write down the services that are associated with any DB’s or picky software you may have.

3/ Run the vmware converter standalone install on the domain controller but choose the advanced install (client-server) and select only the agent to be installed.

4/ Once the install has completed Reboot your DC into DSRM mode by furiously hitting F8 at the appropriate point of the boot process.

5/ Input your DSRM Adminstrator password (remember at this point there are no domain accounts available on the DC).

6/ Open up the services.msc tool and stop and disable those servers you have listed previously as sensitive to change (db’s etc).

7/ Run the vmware vCenter Converter Standalone client on your own laptop or whatever you use to do p2v’s with.

8/ P2v the DC in the normal way running through the wizard (If you get the old multiple connections are not allowed message try inputting the IP of the DC instead of the DNS name or the other way round depending on what you have done first).

ii/ One thing you will need to think about if is your DC points to itself for primary DNS resolution then the conversion will fail and in the export logs you will see something similar to this  “Found dangling SSL error”. Change the server you are converting to point to an alternative DNS server that can resolve your ESXi servers and vCenter addresses.

9/ When the p2v process has completed from the ViClient make sure your vNic’s are disconnected from the network (so when the vm is powered on it wont be able to talk to the production network).

10/ Uninstall all the vendor installed helper drivers and apps etc, (hp/dell/IBM nic drivers and diag utils etc) and configure the networking also reenable and set the services back to automatic or whatever the previous state of the services were (SQL DB’s AV\Backup software etc).

11/ shutdown the physical DC and also the virtual DC.

###WARNING AT THIS POINT THE PHYSICAL DC MUST NEVER EVER BE CONNECTED BACK TO THE PRODUCTION NETWORK EVER AGAIN###

12/ Reconnect your vm to the production network and power it on.

13/ When its booted give it a few minutes to calm down and then login and check the following

Event Logs (its handy to check the old ones pre p2v as well) just to make sure your not panicing about an error or message that existed previously.

Check replication by creating an object in AD (a user for example) on the other domain controller and check that it is replicated to the newly vm’d DC
Delete the newly created object and check that it is also deleted on the other DC.

Run DCDIAG and NETDIAG and pay attention to any errors or informational messages you may receive.

Check your backup software interface I know for sure that Backupexec disables the job and you have to run through the edit settings menu and reselect the drives/folders you want to backup.

Then all you need to do is monitor the situation and just periodically check the event logs etc for oddities.
The last and most important job of all is to go into your server room and decable the old physical server (for the sake of a couple of minutes this could save you hours of heartache if a well meaning tech powers the DC back on again by accident).

How to virtualize Exchange 2007

Well, I’m going through my p2v exercise with my current employer and whilst I have p2v’d many servers I had not converted an exchange server before. Our exchange server is a single windows 2003 R2 x64 box with all the exchange roles installed on it, its not a complicated setup but for us it doesn’t need to be. So googling around I found quite a few horror stories of failed p2v attempts. So with a degree of uncertainty I planned for the exchange conversion.

I started off my stopping and disabling these services on my exchange server (because of this obviously you need to carry this out within a maintenance window as exchange will be down for a couple of hours at least.)

MS Exchange Active Directory Topology
MS Exchange File Distribution
MS Exchange Information Store
MS Exchange Mail Submission
MS Exchange mailbox Assistants
MS Exchange replication service
MS Exchange search indexer
MS Exchange service host
MS Exchange system attendant
MS Exchange transport
MS Exchange transport log sear
microsoft search (exchange)
sql server (blackberry)
sql server browser
sql server vss writer
Backup exec

Now there are a couple of services there that are specific to my set up but you get the idea, anything exchange or sql based I stopped and disabled. I then carried out the conversion as per any other p2v using the standalone vmware convertor program. I also wanted to shrink the servers D drive as it was way too big for the mailstore so I configured that as well at this point.
When the conversion had completed I powered down the old physical server and powered on the vm. Then I removed the brand specific drivers and applications (such as HP and dell drivers and array helpers). The I installed the vm guest tools and configured the networking.

I then reset all the disabled services to automatic and restarted the vm. a few nervous minutes later I was able to view emails in owa, use active sync and send/receive emails through outlook, so it all in all seems to have gone well. At least much better than my googling was suggesting it might.

So in my experience of converting exchange server 2007 all you need to do is stop the exchange and sql services (plus any backup agents you may have) and disable them.
Run the conversion using the current vmware standalone converter
Power off the physical
Power on the VM
Remove all of the hardware specific drivers and applications
Install the vm guest tools
Configure the networking
Re-enable the disabled services and reboot the server.
Then check the event logs etc just to make sure windows isn’t complaining about anything.
Then you just need to let your users know that it is safe to use outlook etc again.

The trouble I guess with p2ving exchange is that once you have brought the services back online and it starts servicing mail requests its pretty much impossible to turn the old physical server back on as you will lose any recent transactions that the exchange vm made. So if there is something thats not quite right with it then you really have to power on through and find a fix for it rather than revert to reconverting or powering on the old physical.

updating ESXI 4 to 4.1 without update manager

Hi all,

So you have one or more esxi boxes at home doing various tasks and they are currently running esxi 4, your tempted to update to 4.1 but do not have update manager installed and do not want the hassle of configuring esxi again.

No Problem good old command line to the rescue again. To upgrade 4 to 4.1 you need the esxi 4.1 upgrade installation as a zip file and the vcli (vsphere command line interface) both are available from the vmware site (I wont post links as I dont know how quickly the links will age), Once you have downloaded the VCLI and installed it you will have a new program item in your start\programs\vmware folder called “vmware Vcli\command prompt”. Click on that and it will dump you in the the old familiar black and white screen. Ensure any vm’s on the server are either powered off or migrated and put the host into maintenance mode.

Navigate your way to the “bin” directory (currently “c:\program files\vmware\vsphere cli\bin” on my computer and run the following:

vihostupdate.pl –server 0.0.0.0 –install –bundle c:\zipfilelocation

Press return and enter the esxi servers admin credentials (probably the root account in a home environment.) in a few minutes the command will complete and tell you that it needs to reboot the server before the process finishes, do this and when it comes back up exit maintenance mode.

WARNING: Before updating be sure to have consulted the vmware HCL (hardware compatibility list) to ensure that your server is compatible with the version you are trying to install. If your server is not specifically listed then you can check out the individual components compatibility either through the hcl or the community driven hcl. the hcl is currently located HERE

Cloned Template Hardware Bug in Vmware ESX 3.5 U2

Today I’ve been wrestling with this http://xtravirt.com/xd10070 bug in ESX 3.5 u2.

The link provides a good insight into what causes it (basically when cloning a template and editing the hardware before the clone begins the source vmdk is actually used instead of the newly cloned vmdk).

This of course becomes a problem if you decide you dont need the template anymore and delete it, the flat file doesnt delete but everything else does and the next time you go to reboot the problem vm you get “a file not found” error and will not let you boot the vm back up again.

I managed to get round this problem by creating a blank vm with the same specifications (most importantly disk size and OS version).

Then copy the remaining flat file of the corrupted vm into the folder containing the newly created vm using the datastore browser.

Rename the newly created vm’s flat file either though ssh on a host to /vmfs/volumes/{your bit here} or through the datastore browser.

Rename the corrupted flat file to the newly created vm name (for example the corrupted flat file might be called vm1-flat.vmdk and the newly created vm might be called vm2, so rename vm1-flat.vmdk to vm2-flat.vmdk).

Then power on the vm and confirm that the os is still intact and working as it should.

I though it was best to copy the corrupted flat file just incase something went wrong as I was performing these actions so I would still have the actual vm os data to go back to.

TTFN.

Increasing the amount of concurrent Vmotions

Here’s a nice little tip which has helped my ESX production deployment no end. By default ESX 3.5 will only vmotion 2 guests at a time which if you have a few on the host can add up a bit of time. It can also cause update manager to fail if the vmotion of the guests takes too long.

Simply change the vpdx.cfg file (normally in c:\docs & settings\all users\app data\vmware\vmware virtualcenter ) file on your vcenter server to the value you desire and restart the virtualcenter service.

The change is required inbetween the <vpdx></vpdx> marker tags and you will need to insert the following:

<ResourceManager>

<MaxCostPerHost>16</MaxCostPerHost>

</ResourceManager>

Now the trick with this is to decide what you want the max cost to be and as usual there is a little light maths involved:

A Hot Migration = 4

A Cold Migration = 1

So if you wanted 4 hot migrations to run concurrently then you would need to add 16 as the max cost. As with all fiddling with production servers you should make a backup of the vpdx.cfg file first before making any changes and then make small changes to the max cost ensuring nothing is honking during the migrations.

VirtualCenter 2.5 Min Hardware Requirements

VC 2.5 requires at least:

2Ghz Cpu

2GB ram

560MB hard drive space

Network Card (pref 1Gb)

The OS needs to be Windows 2000 server with sp4 or Windows server 2003 with sp1 or Windows Server 2003 R2 edition.

Supported databases are:

Oracle 9i

Oracle 10g

SQL Express 2005 (ment for non production or low count farms)

SQL 2005 with sp1

SQL 2000 with sp4

VMware High Availability

High availability or HA as I will call it from now on, is a feature of Virtual Center which allows for the automatic restart of VM’s in the event of a host failure.

For example if you had 4 ESX servers running 40 VM’s (10 on each). if one host goes pop then HA would detect the failure and restart the VM’s on the 3 remaining hosts. However of course whilst there is not a great deal of options to fiddle about with (most of them follow the same pattern)  you do have an important decision to make, which is if a host fails do you want to restart your VM’s or would you rather they stay down. This is basically is it more important that all the vm’s are up and running but possibly slower than normal, or would you rather some or all of the VM’s stay down until you have dragged yourself out of bed and into the office to fix the issue.

VMware HA Tab

HA can be enabled once you have created a cluster by right clicking on the cluster and selecting “Edit Settings”. The first screen you will see consists of 2 check boxes, one for enabling/disabling HA and one for enabling/disabling  DRS. The choice here is self-explanatory but you might want to spend a minute reading the couple of paragraphs on that page.

The next tab worth looking at is the Vmware HA tab there are 3-4 options here that you will need to consider.

The first option is Admission Control, within that setting is the options to set the number of host failures the cluster can tolerate this can be any number between 1-4. This by default is set to 2 and of course if you suddenly find yourself loosing 4 hosts in your cluster then you have a rather large problem on your hands. The next option is to prevent or allow the powering on of vm’s if they violate availability constraints. This means basically do you want to allow VM to be powered on even if the total number of configured memory resources exceeds the actual resources that the cluster provides.

Maths bit: You can work out your availability constraints by taking the amount of ram provided by your smallest ESX host (I.E, the one with the least amount of physical memory) and then find your vm with the most amount of configured memory and divide the ESX memory by the vm ram which will give you your figure of the amount of guest vm’s each host can have, any more than that and your availability constraints have been violated!

Example:

6 ESX Hosts smallest has 24 GB of ram largest amount of guest ram is 2GB and host failure is set to 1.

24/2×5=60
So if one host fails the total amount of virtual machines that can be powered on with violating availability constraints is 60. If you need any more than that then you will need to allow the vm’s to power on even though they violate the constraints.
The next setting is the default cluster setting, within the settings is the vm restart priority which is by default set to medium This setting is cluster wide but can be overridden at individual vm level. The configurable options for the restart policy are high, medium low and disabled. Clusters set to high have their vm’s restarted first, medium next and low last and disabled.. well you get the idea.
The next option down is the host isolation response setting, I’ve blogged about this before so I wont go into too much detail but it basically means when a host loses its network connection to other hosts in the cluster. Network connectivity is checked regularly by a heartbeat (ping). It is possible in vmware to have a situation where the host has lost its network connection to the other hosts in the cluster but the vm’s on that server are perfectly happy and working away normally. In previous updates of esx the default behaviour was to power off the vm which would leave HA to restart the vm’s on another server, However from U2 onwards this changed to leave vm powered on because you will find 9 times out of 10 just the service console is having issues and the vm’s are still working away quite happily.
The next option is “virtual machine monitoring” this again is an experimental feature which allows virtualcenter to monitor the tools installed on the vm’s (by using heartbeats in a very similar way to HA). You an enable this option and use the slider to adjust the sensitivity. Virtual machine monitoring does know when a vm has been purposefully shutdown or powered off so it will not be constantly restarting machines you are trying to shutdown.
The only button on this page is the Advanced Options which you should only make changes to this once you know what you are doing.
The sub tab of the HA options is “Virtual Machine Options” this allows you too specify different restart priority and host isolation response behaviours for individual vm’s. So you can effectively prioritize/delay or disable the restarting of individual vm’s should you deem appropriate.

Vmware common myths

I think this is a post that i will keep updating as i think of things, but i thought i would start out with nice easy ones to get going with.

Something that quite often happens with a new esx farm is the admins want to tentively vm a ‘low risk’ server thinking that its not the end of the world if it goes skyward.
This normally translates as a server that does not do much and has been sat in the corner of the server room for years banging away doing its thing. Now of course when its virtualized its given a whole new set of hardware that is years away from what its used too. This gets admins and users very excited as whatever that server used to do is now been given a massive boost in performance.

Virtualization is not about speed its about consolidation. As admins start virtualizing other boxes the old server may very well go back to about the speed it was before.

 

 

More to follow

Ramping up

Well I’m now starting to think about ramping up my vmware studying due to the oncoming 1st attempt at the exam. I do feel a bit more relaxed about it than previous exams, probably because I administer esx/vc loads at work but of course that doesnt mean I can take it easy. I’ve rebuilt my ESX lab at home using the eval versions of esx and VC I downloaded a while back. The only thing I had to re-register was for another evaluation version of VMware Workstation, otherwise I just used the ISO’s I had previously downloaded.

Anyway As I am studying and doing test’s I shall put up various musings on my way.

On a side note because I do not have a great deal of linux knowledge I am having quite a few problems virtualizing exisiting linux boxes. I am bookmarking interesting links to do with virtualization (methods including rsync etc), but so far have only managed to virtualize one linux box without too much trouble. This is kinda spurring me on to learn loads more linux stuff.