SMP KVM guests hanging in 2.6.35

faceless

New Member
Jul 4, 2009
25
0
1
Hi all

I think I'm hitting the same issue mentioned in http://forum.proxmox.com/threads/5046-Error-kvm-cpu0-unhandled-wrmsr-amp-unhandled-rdmsr but I'm starting a new thread because I'm not sure it's related to the "cpu0 unhandled wrmsr: 0x198" message.


I have a box with 2 x Xeon 5300, for a total of 8 cores, and I'm trying to run a number of VMs on this: 2 with 8 cores, 2 with 1 core, plus the odd other one, all running Ubuntu 10.04 LTS with kernel 2.6.32-32-virtual. Often when booting the 8-core VMs they will hang halfway throught the initialization process: seemingly at a different point each time.

The problem seems related to the number of cores in the VM. If I'm running 1x8 core and 1x1 core guests, and I try to start another 8 core guest, it hangs. If I reduce it to 4 cores, it starts. The magic number seems to be about twice the number of actual cores I have, after which point it fails about 80% of the time.

This is with 2.6.35-1, but if I keep the same setup then downgrade kernel to 2.6.24 it works. Here's the output of pveversion -v when it fails:

pve-manager: 1.8-17 (pve-manager/1.8/5948)
running kernel: 2.6.35-1-pve
pve-kernel-2.6.35-1-pve: 2.6.35-11
pve-kernel-2.6.24-12-pve: 2.6.24-25
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.26-1pve4
vzdump: 1.2-12
vzprocps: 2.0.11-2
vzquota: 3.0.11-1


it's a fresh install (over top of a fresh Debian Lenny rather than direct from your boot disk).

I was also trying this under a fresh install of Ubuntu Natty as well, and seeing the same results - start a VM as "kvm -M 512 -smp 8 guest.qcow2" and it hangs, but change to "-smp 4" and it's fine. I can also reproduce it in Proxmox without the UI by running the same command.

More info.
* This happens with both 2.6.32-32-virtual on the guest and also the "linux-generic-pae" kernel.
* If the guest gets through the kernel loading, it seems stable - I can thrash all the cores on the running image while attempting to load the other one, and the running one will operate correctly.
* When the guest hangs, it's KVM process is using a lot of CPU (busy loop of some sort)
* ksmtuned was running under Proxmox 2.6.32, and ksmd under Ubuntu.
* Not a resource issue - guest images are stored locally on SSD, there's no swap and plenty of RAM to spare, no message in server logs (other than the "cpu0 unhandled wrmsr: 0x198" one on starting guest, but I see that regardles of success or failure), and when the guests start they're idle, so the chips are definitely NOT too busy to handle a VM with this number of cores.

Also I should point out that I did have this working for a while on a previous build, which was an awful hybrid of Ubuntu kernel and components with Proxmox 1.7 merged into it (with various --force options). Not really maintainable and probably not much use for a bug report, but I mention it for completeness.

Hope this is enough for you guys to reproduce it this time. I have some kernel boot logs of the hung instances under Ubuntu (not Proxmox, as I said the symptoms are the same) - let me know if you want them.

Cheers... Mike
 
pve-manager: 1.8-17 (pve-manager/1.8/5948)
running kernel: 2.6.35-1-pve
pve-kernel-2.6.35-1-pve: 2.6.35-11
pve-kernel-2.6.24-12-pve: 2.6.24-25
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.26-1pve4
vzdump: 1.2-12
vzprocps: 2.0.11-2
vzquota: 3.0.11-1

Pakage pve-qemu-kvm is missing! So you run some unknown version of kvm (at least not the one we provide)?
 
Sorry - I'd installed the packages making up proxmox-ve-2.6.24, including pve-qemu-kvm, but not the metapackage itself, and pve-version seems to look for that package. I've installed proxmox-ve-2.6.24, it didn't add or delete any other packages and now pveversion -v gives

pve-manager: 1.8-17 (pve-manager/1.8/5948)
running kernel: 2.6.24-12-pve
proxmox-ve-2.6.24: 1.6-26
pve-kernel-2.6.35-1-pve: 2.6.35-11
pve-kernel-2.6.24-12-pve: 2.6.24-25
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.26-1pve4
vzdump: 1.2-12
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-3
 
Bbecause it's working better than 2.6.35 :-) I must have missed the the 2.6.32 while looking for the kernel versions to downgrade to, will try that this evening and let you know the results.

However the point of my post is that 2.6.35 isn't working, rather than 2.6.24 is working.
 
Thanks Tom, but I'm not asking for support with 2.6.24.

I'm asking for support with 2.6.35, which is causing my VMs to hang, as described in my first message.
 
Again, you miss some packages as Dietmar already posted. if you want to use 2.6.35 you need to install all packages for 2.6.35.

apt-get install proxmox-ve-2.6.35

your pveversion -v for 2.6.35 should show:
Code:
pveversion -v
pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.8-11
pve-kernel-2.6.35-1-pve: 2.6.35-11
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.27-1pve1
vzdump: 1.2-13
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6
 
Have now run the upgrade. Exactly the same results. Here's pveversion -v

pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.8-11
pve-kernel-2.6.35-1-pve: 2.6.35-11
pve-kernel-2.6.24-12-pve: 2.6.24-25
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.27-1pve1
vzdump: 1.2-13
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6
 
Just tested the 2.6.32 kernel as well, and this seems to be fine. Here's pverversion -v

pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-33
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.35-1-pve: 2.6.35-11
pve-kernel-2.6.24-12-pve: 2.6.24-25
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.27-1pve1
vzdump: 1.2-13
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6

Incidentally things are working correctly even though I am now seeing the "unhandled wrmsr: 0x198 data 0" message from KVM. So that's definitely not the cause of this issue.

So to recap: SMP guests are hanging in 2.6.35, but not in 2.6.24 or 2.6.32, with all the latest versions of the packages applied, as described in my first message.
 
Last edited:
Just tested - no difference I'm afraid.

However I think this is caused by the same "events/0" processing locking at 100% problem that I've seen reported elsewhere on this forum (http://forum.proxmox.com/threads/59...rable-w-high-events-1-usage-due-to-virtio-nic). I'm using virtio for the NICs in all my VMs, and the VM that did come up successfully under 2.6.35 was exhibiting this problem. I also mount NFS drives at boot in all VMs, so on a hunch I changed all the VMs to use e1000 and they've all booted successfully.

I'll go back to 2.6.32 with virtio for now.
 
Last edited: