linux KVM VM freezing on SMP kernel

jhammer · May 13, 2010

I have a linux VM (KVM) that is freezing randomly. It occasionally freezes during the boot process. It mostly happens when there is heavy disk access or several things going on. For example, it freezes consistently if I am compiling something and doing an rsync.

I've seen 2 clues as to what might be happening.

Clue #1: When using virtio drives I see the following on boot:

virtio-pci 0000:00:03.0: can't find IRQ for PCI INT A; probably buggy MP table
virtio-pci 0000:00:04.0: can't find IRQ for PCI INT A; probably buggy MP table
vda: vda1 vda2 vda3
virtio-pci 0000:00:05.0: can't find IRQ for PCI INT A; probably buggy MP table
vdb: vdb1
virtio-pci 0000:00:06.0: can't find IRQ for PCI INT A; probably buggy MP table

When the system freezes I see no kernel panic messages nor is anything written to the logs.

Clue #2: When I use IDE drives for the same system, I do see a kernel panic when the system freezes. The messages in the kernel panic are always ext3 related. It sometimes panics when doing the file system check on boot or, as mentioned above, when there is heavy disk access.

Any thoughts on things to look for to fix this?

Environment:
pve-manager: 1.5-8 (pve-manager/1.5/4674)
running kernel: 2.6.32-1-pve
proxmox-ve-2.6.32: 1.5-4
pve-qemu-kvm: 0.11.1-2
VM is running 2.6.32 kernel.

jhammer · May 13, 2010

A few more pieces of information:

* When using a uniprocessor kernel, the "buggy MP table" errors go away and the system does not appear to freeze.

* Many sites I have found suggest it is 'bios' related.

* There were several suggestions to add the boot flags 'nolapic' and/or 'noapic'. Enabling 'nolapic' disables SMP. I'm left with a single processor. Enabling 'noapic' fixes the "buggy MP table" errors. I also still have SMP (in this case 4 processors). There is a difference in /proc/interrupts when 'noapic' is and is not enabled.

So 'noapic' could be a solution here. What kind of negative effect does it have?

jhammer · May 13, 2010

I was able to get it working without the 'noapic' boot option. I had to enable Power Management support and ACPI Support in the kernel.

jhammer · May 14, 2010

jhammer said:
I was able to get it working without the 'noapic' boot option. I had to enable Power Management support and ACPI Support in the kernel.

Well...maybe not.

The "buggy MP table" messages are gone now...but I still get the random freezes when there is heavy disk/proc activity on the VM.

tom · May 15, 2010

can you test with the kernel/kvm from pvetest? http://forum.proxmox.com/threads/38...vetest-repo-(Kernel-branch-2.6.24-and-2.6.32)

dietmar · May 16, 2010

And what kernel does the VM use? Try a newer one if possible.

jhammer · May 18, 2010

dietmar said:
And what kernel does the VM use? Try a newer one if possible.

The VM uses the latest 2.6.32 kernel.

I've been doing more testing using dbench and rsync. It seems that any time I have more than 1 socket or more than 1 core selected for the VM, I can get it to crash. The only configuration that has not had a kernel panic so far is 1-core/1-socket with a non-SMP kernel.

dietmar · May 19, 2010

jhammer said:
The VM uses the latest 2.6.32 kernel.

If you give a bit more info, we can try to reproduce the problem (what distrib, exact kernel versions, how to trigger the bug).

jhammer · May 19, 2010

dietmar said:
If you give a bit more info, we can try to reproduce the problem (what distrib, exact kernel versions, how to trigger the bug).

Distrib - Gentoo
Kernel version - sys-kernel/gentoo-sources-2.6.32-r7
Sockets: 1
Cores: 4
Memory: 2048
Disk backend: Virtio, iscsi over multipath (different "cache=xxxx" settings don't appear to make a difference)
Ethernet: Virtio
Triggering the bug - I've caused the kernel to panic in several ways:

Simultaneously running 'dbench -c /usr/share/dbench/client.txt 40' and a local rsync of a lot of data (same disk to same disk).
Simultaneously running a large compile (emerge gcc) and a local rsync of a lot of data (same disk to same disk).
Simultaneously running a large compile (emerge gcc) and an rsync of a remote disk to a local disk
Occasionally the oops happens during an fsck.ext3 on boot.

NOTES:

This VM is a physical host I am trying to migrate to Proxmox.
We have a Gentoo VM running SMP that I have not been able to cause a kernel panic on. There are lots of variables though. A few include:

1 socket/2 cores v.s. 1 socket/4 cores
Disk backend - Virtio, Local file, raw format v.s. Virtio, iscsi over multipath
The working VM is newly created in Proxmox v.s. a migrated physical host
System packages are all current on the newly created VM.
Kernel version is slightly older on newly created VM - sys-kernel/gentoo-sources-2.6.31-r10

I have pdf's of kernel oopses I could send if that would be helpful. They were a little too large to attach here.

I appreciate your help.

udo · May 19, 2010

Hi,
have you tried an non-virtio-nic, like e1000? Only to be sure.
Your kernel panic depends on disk and not on nic-use, but with the virtio-nic-driver (on windows) i had some issues.

Udo

dietmar · May 20, 2010

jhammer said:
There are lots of variables though. A few include ...

I doubt we can debug that - you should can try to eliminate most of those variables.

jhammer · May 21, 2010

dietmar said:
I doubt we can debug that - you should can try to eliminate most of those variables.

I setup test cases for each and eliminated all the variables. It doesn't seem to matter where the disk is located (iscsi or local). Disk bus type doesn't matter. Kernel version doesn't matter. Many different kernel settings tried didn't make a difference. A system update did not resolve the problem.

This is the only VM with the issue. I have migrated other servers without incident. Newly created VM's in Proxmox don't have the issue.

I'm just going to rebuild this server.

Thanks for all the help and suggestions.

jhammer · Jul 16, 2010

I started an SMP server from scratch. I'm still getting kernel Oopses. Here is what I am getting:

<snip>
kernel BUG at arch/x86/mm/highmem_32.c:45!
invalid opcode: 0000 [#6] SMP
last sysfs file: /sys/devices/virtio-pci/virtio1/block/vda/uevent
Modules linked in:
<snip>

Here is the config of the VM:

ide2: NFSBlueISOS:iso/ubuntu-10.04-desktop-i386.iso,media=cdrom
vlan0: virtio=3E:E9:E7:92:F5:2A
bootdisk: virtio0
ostype: l26
memory: 2048
sockets: 1
virtio1: MAILDATA:vm-109-disk-1
onboot: 0
cores: 2
boot: cd
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
virtio0: MAIL:vm-109-disk-1

Is there something I'm missing here in the kernel config?

Thanks.

tom · Jul 16, 2010

always add your version (pveversion -v)

jhammer · Jul 16, 2010

Thanks. Here it is:

plaid:/etc/qemu-server# pveversion -v
pve-manager: 1.5-8 (pve-manager/1.5/4674)
running kernel: 2.6.32-1-pve
proxmox-ve-2.6.32: 1.5-4
pve-kernel-2.6.32-1-pve: 2.6.32-4
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-11
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-10
vncterm: 0.9-2
vzctl: 3.0.23-1pve8
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.11.1-2
ksm-control-daemon: 1.0-3

tom · Jul 16, 2010

try the latest .32 kernel (with kvm 0.12.4) - better?

jhammer · Jul 16, 2010

Is this the correct process for upgrading:

apt-get update
apt-get upgrade
apt-get install proxmox-ve-2.6.32
apt-get install kvm

tom · Jul 16, 2010

yes, the last line is not necessary, the virtual package (proxmox-ve-2.6.32) installs all what you need.

see http://pve.proxmox.com/wiki/Downloads#Update_a_running_Proxmox_Virtual_Environment_1.x_to_1.5

make sure you boot into the right kernel and check 'pveversion -v' if you got all right.

jhammer · Jul 16, 2010

OK. I've upgraded one of my servers to proxmox-ve-2.6.32.

plaid:~# pveversion -v
pve-manager: 1.5-10 (pve-manager/1.5/4822)
running kernel: 2.6.32-2-pve
proxmox-ve-2.6.32: 1.5-7
pve-kernel-2.6.32-1-pve: 2.6.32-4
pve-kernel-2.6.32-2-pve: 2.6.32-7
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-16
pve-firmware: 1.0-5
libpve-storage-perl: 1.0-13
vncterm: 0.9-2
vzctl: 3.0.23-1pve11
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.4-1
ksm-control-daemon: 1.0-3

When I tried doing an online migration from another server back to this one it didn't work. It said transfer failed but that the migration succeeded. The VM then appears on the host I was trying to migrate back to, but it is shutoff. Is that supposed to happen if KVM versions differ on the 2 boxes?

Thanks.

udo · Jul 16, 2010

Hi,
online migration work only between host with the same kernel and the same kvm-version.

Udo

linux KVM VM freezing on SMP kernel

Member

Member

Member

Member

Proxmox Staff Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Distinguished Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Distinguished Member

We value your privacy