a case history...

rayk_sland

Active Member
Jul 30, 2009
53
1
28
I thought I would post my experience (so far) with proxmox, in case it's useful. I've certainly learned plenty.

Hardware: Intel MFSYS25 modular server -- 4 compute modules (blades), 3 disk pools one configured as boot drives for each blade and two as shared pools, Shared LUN, single Storage module.

Configuration: 2 Proxmox Clusters of two blades each, with one shared pool for each cluster

History
Initial configuration was proxmox ve 1.3 which had no shared storage scheme. I opted for ocfs2 on the shared pools so that I could quickly down the vm, move the ???.conf file from the /etc on one node to the other and then bring the vm up. Not online migration, but at the time, online migration meant copying a fairly large file first. I wanted to take advantage of shared storage, even though pve13 wasn't really ready for me. I opted for qcow2 files instead of raw because I thought they'd be more space efficient.

After upgrading to pve14 (and subsequent), I was surprised that the new shared storage scheme had no support for clustered file systems, even though I could and did connect to my common disk pool (still using ocfs2 for the file system) as a shared folder. I could now migrate an online server between cluster nodes without any file copying process. But new VMs I created after this time were usually the default, raw, rather than qcow2, although I really don't know why.


Problems started to crop up related to disk IO. Tasks would hang and leave nasty messages where I could read them with dmesg. At one point I was copying an Iso to a shared iso directory at the proxmox ve level on the same file system as my vms and managed to take everything down. I had to force everything to power off just to get access to the cluster consoles. When I was installing SQL server to a win2008 vm on the ocfs2 file system it hung itself and the rest of the cluster. IMAP users on the mailserver vm, could jack the utilization from .2 to 25 just by moving some messages arround.

dmesg at the proxmox system level gave me lots of clues pointing to ocfs2 and I thought to myself that I should have made the slightly painful switch to lvm when I upgraded proxmox. At any rate I figured the way I had implemented ocfs2 was probably the problem. disk IO at the vm level had a lot more hoops to go through my way and it was starting to tell.

So how to do it with minimum down time. First of all, extra storage. I had some unused space, about 190GB in the disk pool with the boot drives so I allocated it shared it with all blades, and made it into a volume group. And because the temporary disk pool was so limited in space I opted for doing one cluster at a time. The default sized raw vms 32GB were fairly trivial to 'dd' into their temporary homes in the new lvm volume. The windows VM at 100GB was slightly more difficult until I figured out how to use WBAdmin. I backed up the server and then restored it to the new vm from within windows. Once that was done I removed all the ocfs2 configuration from the first cluster and reconfigured the 'proper' disk pool for use as shared lvm space. I needed the partprobe utility from the gparted package to make the volume group visible to the second cluster node after creating it on the first. I then did the same process in reverse to copy the VMs back to the volumes on the reconfigured pool. (dd for the linux 32 GB vms, Windows restore for the 100GB win2k8 vm)

The second cluster was more problematic. All the VMs were qcow2 and the mail server's file system once raw would be 200GB - too big for my temporary lvm group. I copied the least essential qcow2 file down to an nfs share where I could process it from the nfs server's side into a (32GB) raw file and dd it back into my temporary volume group. That process if not perfectly attended, took about an hour. Whether there was anything I could have done to optimize the process it I don't know, but if you do the math and compare one hour for 32GB to 6.25 for 200GB I was looking at a heck of a lot of down time for the mail server. Not really acceptable.

So after quite a bit of research I hit upon using dump and restore in the manner the following recipe...
http://www.linuxscrew.com/2007/08/13/move-linux-to-another-hard-drive-dump-restore-backup/

I simply created another 'hard disk' for the existing vm as a volume from the temporary pool, mounted it and piped a dump of the main file system through to a restore restoring to the new file system - from the above link.

(I didn't have to worry about having enough space for the whole 200GBs of the raw vm file because there was enough space for the actual files. I'll expand it again when it's out of the temporary pool. This was one thing reprocessing the qcow2 file into raw and then dd'ing it into a new vm just couldn't do for me.)


mount /dev/vdb1 /mnt
cd /mnt
dump -0uan -f - / | restore -r -f -


thus I could clone the system while live. I merely checked the logs afterwards and copied across a few emails (this was the dead of night) which came while the cloning was happening.

I then followed the grub instructions from the above website slightly modified.

first edit /boot/grub/device.map to include the new volume as a device

something like

(hd0) /dev/vda
(hd1) /dev/vdb

and run grub to force it to use that file instead of probing

grub --device-map=/boot/grub/device.map

and then on with

root (hd1, 0)
setup (hd1)
quit

After that I edited /etc/qemu/server/<vmid>.conf and swap the virtio0 entry for the virtio1 entry and restarted the vm.

Suddenly, the vm seems to work so much better. Utilization is down where it should be and mail access is quick, even snappy.

Conclusion. removing the file system layer, especially a clustered file system between the vm's and the disk and accessing them via lvm instead represents a huge performance increase. The evidence is that file writes under my earlier regime were bottled up by the intervening layers (kvm, qcow2, ocfs2) as compared to now (yes I know, I'm only finally using it as designed -- kvm, raw, lvm)