SMP guest with clocksource=kvm-clock hangs

steven

New Member
Feb 28, 2012
4
0
1
Hi, I've been trying to find more info on this on the forum, but nothing exactly matches the problem I am seeing. I am running a Debian lenny guest (2.6.26-2-amd64) on a 2 x E5540 server running ProxMox VE.

host# uname -a
Linux mysterix 2.6.32-6-pve #1 SMP Mon Dec 19 06:55:36 CET 2011 x86_64 GNU/Linux

host# pveversion -v
pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-55+ovzfix-2
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-55+ovzfix-1
pve-kernel-2.6.32-7-pve: 2.6.32-55+ovzfix-2
qemu-server: 1.1-32
pve-firmware: 1.0-15
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-2
ksm-control-daemon: 1.0-6

VM configuration (/etc/qemu-server/205.conf):

name: lenny-guest
ide2: cdrom,media=cdrom
bootdisk: virtio0
args: -serial unix:/var/run/qemu-server/205.serial,server,nowait
ostype: l26
memory: 4096
sockets: 2
onboot: 0
cores: 1
virtio0: local:205/vm-205-disk-1.qcow2
vlan0: virtio=DA:65:DD:37:3A:DF

When I start the lenny-guest VM, it seems to come up fine, but after a few minutes (15 at the most) it stops responding. The corresponding "kvm" process on the host uses 100% CPU (well, it's using two cores, so a little over 100% :-)). When I start the VM with only one VCPU, there's no problem.

The same VM (both as SMP and single CPU) is working on another host, an (ancient) Ubuntu 9.10 (2.6.31-19-server) installation on identical hardware (2 x E5540), so I figured it must be something in the host/guest interaction.

I tried various options to fix this, including changing the disk driver from "virtio" to "ide" or "scsi", and the network driver from "virtio" to "e1000", "rtl1389", etc. Nothing helped, until I hit on the clocksource kernel parameter.

Both hosts have clocksource "tsc", which is fine, and the guest settles on "kvm-clock" as its clocksource on both hosts. However, in one instance (ProxMox VE) it hangs, in the other it doesn't.

I then tried to start the VM with two VCPUs on the ProxMox VE host with each of the supported clocksources in the guest and testing clock skew with repeated, periodic "ntpdate -d" runs:



  • kvm-clock
Hangs soon after boot.

  • hpet
Stable O/S, clock skew between -0.039 and 0.068 msec (0.107ms band).

  • acpi_pm
Stable O/S, clock skew between 0.002 and 0.071 msec (0.069ms band).

  • jiffies
Stable O/S, clock skew between 0.426 and 1.312 msec (0.886ms band).

  • tsc
Stable O/S, clock skew between 0.003 and 0.354 msec (0.351 band).
More worrying is that it was growing larger and larger, whereas the other clocksources were sort of oscillating between the two extremes.

So, for now, I've settled on "divider=10 clocksource=acpi_pm" as my life saver for SMP guests.

I'm curious if other people have run into the same problem. If so, did you find a similar solution or something radically different?

Am I correct in assuming that the "kvm-clock" clocksource in newer kernels is not exactly stable?

-- Steven
 
use a 2.6.32 kernel for your Lenny guests and try again. a lot of users report several problems and crashed with the default 2.6.26 lenny kernel.
 
use a 2.6.32 kernel for your Lenny guests and try again. a lot of users report several problems and crashed with the default 2.6.26 lenny kernel.

Thanks, will give that a try. I suspected the host O/S as the obvious source of the problem because the exact same guest is running stable on a different host kernel on the exact same hardware.