Migration of VM (apparently) locked up host

swhite

New Member
Mar 20, 2011
9
0
1
I attempted to migrate a VM between two nodes in my cluster(node A to node B). Node B hosts our low-traffic mailserver and local DNS host and generally experiences low load. Both A and B have identical vz.confs (quotas on in both) and node B has plenty of disk space to host the VM in pve/data. The VM itself is large, approximately 30GBs. Otherwise, this was a routine migration. It was also an offline migration.

Shortly after I began the migration (via the Proxmox GUI), I received a Nagios notification that the mailserver on node B was not accessible through it's SMTP port, and node B's load was increasingly rapidly (it was up to 25.00 average over 1 minute). Node B did not crash, however, as I could ssh into it. Trying to start any process on B resulted in the connection hanging. I could use 'ls' and 'cd' but could not 'top' 'ps' or 'kill' anything(as though I ran out of PIDs or something like that). I aborted the migration hoping to restore the server but it didn't help. I had to reboot. After rebooting I found nothing of note in the logs to my frustration and I could not even access the migration log in the GUI: trying to click on it took me to a different log entirely.

I know this may be one of those rare and mysterious issues that have no good reason for happening, but I wanted to check with the community to see if anyone had experienced something similar to my situation or if there is any insight into why a migration would cause this, if indeed the migration did. The only thing I can think of is that rsync went nuts. Interestingly, I found node A to have backports activated on it(I'm not the only one maintaining this server) and the rsync is a later version(3.07) than what is on node B(3.03). The kernel versions are the same(the Jan 23 2.6.32-6 kernel). This is the only lead I have, however, but if anyone else has ideas, please let me know. Thanks!