64bit vm freeze after live migration

resoli · Mar 17, 2010

Hi,

I'm running a pve cluster from two bare-metal installs[1]. I'm testing online migration of windows and linux vms , and found that a karmic 64bit machine is often freezing after a live migration.

The migration itself (machines are lvm-based, on a SAN provided disk, presented to both cluster nodes) succeds, but after some time the machine freezes.

Windows (server 2003, 32 bit) machines migrates without problems, and so appears for karmic 32 bit.

In particular, freeze happens only migrating back from node to master, not from master to node.

Furthermore, i found that there is a significant drift of the clock, which appears only after a migration from node to master. This is evidenced by running ping[2] during migration;
the "Warning: time of day goes back (-<microseconds>us), taking countermeasures." appears just after migration, and only when migrating from node to master.

Both cluster nodes are running ntp configured with our reference ntp server on internal lan.

Any hint?

bye,
rob

[1] # pveversion -v
pve-manager: 1.5-8 (pve-manager/1.5/4674)
running kernel: 2.6.18-2-pve
proxmox-ve-2.6.18: 1.5-5
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-11
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-10
vncterm: 0.9-2
vzctl: 3.0.23-1pve8
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm-2.6.18: 0.9.1-5

[2] $ ping 192.168.10.48
PING 192.168.10.48 (192.168.10.48) 56(84) bytes of data.
64 bytes from 192.168.10.48: icmp_seq=1 ttl=64 time=359 ms
64 bytes from 192.168.10.48: icmp_seq=2 ttl=64 time=0.311 ms
Warning: time of day goes back (-2158us), taking countermeasures.
Warning: time of day goes back (-2046us), taking countermeasures.
64 bytes from 192.168.10.48: icmp_seq=3 ttl=64 time=0.000 ms
64 bytes from 192.168.10.48: icmp_seq=4 ttl=64 time=0.854 ms
64 bytes from 192.168.10.48: icmp_seq=5 ttl=64 time=2.66 ms
Warning: time of day goes back (-2144us), taking countermeasures.
64 bytes from 192.168.10.48: icmp_seq=6 ttl=64 time=0.000 ms

resoli · Mar 18, 2010

Probably i found a workaround: changing clocksource.

I put these kernel options for 64bit guest (tested on lenny's vmlinuz-2.6.26-2-amd64 ):

divider=10 clocksource=acpi_pm

http://www.redhat.com/docs/en-US/Re...rtualization-KVM_guest_timing_management.html

the ping warnings are gone, and no freezes after some live migrations.

bye,
rob

resoli · Mar 18, 2010

At last, i have found that the problem is not related to architecture, but to the number of cpu sockets configured on the vm.

If simply i configure only one cpu socket, even with clocksource=kvm-clock i can migrate flawlessly.

bye,
rob

resoli · Mar 19, 2010

Seems also related to this: http://forum.proxmox.com/threads/17...-Server-Support-Experiences?p=15406#post15406

Ninjix · Apr 6, 2010

resoli said:
If simply i configure only one cpu socket, even with clocksource=kvm-clock i can migrate flawlessly.

I just ran into the same problem. Thanks for posting your experience and saving me a bunch of troubleshooting. Does anyone know why multiple vcpu exhibit this problem on Karmic?

Search

Search

64bit vm freeze after live migration

resoli

Renowned Member

resoli

Renowned Member

resoli

Renowned Member

resoli

Renowned Member

Ninjix

Guest

We value your privacy