Saturday, October 3, 2009

VMWare Notes

I've been using VMWare Server 1 & 2 for quite some time now. I've toyed around with ESXi (and am getting ready to move a clients full server base over to an ESXi system in the next couple weeks).

This post is mostly to talk about what I've just recently done in VMWare Server 2.0 though. I wanted to write down some settings that I'll hopefully use in the future (and 'upgrade' other clients to if need be), and note some concepts/scripts I used for backups of the virtual machines.

I picked up a couple T100's for my church, and before I realized that the boxes had internal USB connectors (hopefully ideal for ESXi booting in the future??), I went with my normal CentOS5 x64 base install. I generally prefer it for manageability's sake. Of course, most of that additional manageability goes out the window when the client (my church in this case) only has one public IP. I still planned on doing some weirdness in this setup, so having that full base would help make that happen.

Of course, the CentOS host ends up being cheaper, 'cause you don't have to have a hardware raid solution. I software RAID1 two partitions -- one for /boot, and one for LVM (containing swap, root, and some free space (which I forgot this time around.... DOH!)).

I've got an 'ovf' of IPCop that I put together for quick deployment of a Pro IT approved VM firewall that I put on one of the boxes. The other box was going to 'replace' their existing server. VMWare Converter's been doing a nice job lately for me, so it was a quick an painless P2V last night / this morning.

I ordered each box with an extra NIC (one needed it since it was host to the IPCop vm) so that they'd be the same. My thought was that if I could get a backup of each VM to the other host's drive on a periodic basis, I could have stupid-short down times in case of a hardware failure. Boot up the copied vm on the still running hardware, and either rsync changes (slap one of the drives from the failed system on the still working system via USB) or restore from backup to get 'em up and going, still in RAID. I wonder if one day I should look into DRBD for this "two identical box" scenario....

Anyway -- the concept I tested and mostly verified was to create a snapshot, rsync the original vmdk's (a snapshot creates a bunch of new small vmdk's to hold the differences), and remove the snapshot. I say "mostly" because a regular rsync hosed IPCop. 'Er -- I had a test P2V on vm2, tried to rsync it to vm1 (hosting IPCop), and IPCop ... stopped responding. I could ping it, and I could start to ssh to it, but it'd time out.... When I got on site, I couldn't open the console to it... Very odd. I _think_ it had to do with memory tweaks that I hadn't implemented...

I think it had to do with memory because you could watch 'top' on vm1 (rsync initiated from vm2) and see memory usage climb... At the time of the test, I _did_ have vmware's 'tmpDirectory' config variable pointing to '/dev/shm', but '/dev/shm' was only 3.75GB in size, compared to the 4GB of memory in the system... And the IPCop VM was only using 264MB of '/dev/shm'.... I dunno... Anyway -- it was rsync doing it's normal 'delta' differences between these 2GB files. The 'fix' was to add the '-W' switch to rsync to tell it, "hey -- if there's a difference in the file size/mod time, don't waste your effort trying to find out where in the file the change is -- just transfer the whole file!" I tested both ways, and though the 'delta' based ended up being quicker, it killed IPCop, so it wasn't a viable solution ;)

I've since made some more memory tweaks... Of course, I don't fully understand the 'tweaks' that I implemented, nor have I re-tested the delta based rsync.

Anyway -- here's all the notes of things regarding vmware performance on these boxes that I'm noting to myself:
  • 'noatime' on fs's hosting vm's
  • in /etc/vmware/config:
# I normally add this next line in the VM's... does this central config preclude that addition?
mainMem.useNamedFile = "FALSE"
# always seen a big improvement using this.... of course, you've gotta make /dev/shm [almost] as big as your installed memory via /etc/fstab
tmpDirectory=/dev/shm
# next two lines really get set via the web interface by saying "fit all the vm's memory in physical memory", and how much memory to allocate to VMWare
prefvmx.minVmMemPct = "100"
prefvmx.allVMMemoryLimit = "3072"
# no idea what this is
prefvmx.useRecommendedLockedMemSize = "TRUE"
  • in /etc/sysctl.conf:
# I've never really used these before... Supposedly helps with the host's normal desire to swap a bunch (which leads to more i/o, conflicting with vm's need)
vm.swappiness = 0
vm.overcommit_memory = 1
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.dirty_expire_centisecs = 1000
dev.rtc.max-user-freq = 1024
  • Append "nohz=off" to the kernel options in grub.conf
  • Lastly, in the VM's:
mainMem.useNamedFile = "FALSE"
MemAllowAutoScaleDown = "FALSE"
MemTrimRate = "0"
  • Now for the rsync backup script (set to run via a file in /etc/cron.d):
#!/bin/bash

# Get name and num's from `vmware-vim-cmd -U jolly vmsvc/getallvms`

# Sync ipcop's (16)'s vmx file
SOURCENAME=IPCop
SOURCENUM=16

SLEEPAFTERSNAPSHOTTIME=15
SOURCEROOT=/srv/vmware
DESTHOST=vm2
DESTROOT=/srv/vmware

pushd ${SOURCEROOT}/${SOURCENAME} >/dev/null
rsync -a ${SOURCENAME}.vmx ${DESTHOST}:${DESTROOT}/${SOURCENAME}
vmware-vim-cmd vmsvc/snapshot.create ${SOURCENUM}
sleep $SLEEPAFTERSNAPSHOTTIME
rsync -aW --exclude=*0001* *.vmdk ${DESTHOST}:${DESTROOT}/${SOURCENAME}
vmware-vim-cmd vmsvc/snapshot.remove ${SOURCENUM}
popd >/dev/null