strange kernel error and vm soft lockup after upgrade to 1.9

cdoering · Sep 15, 2011

Hi.

We are running serveral machines (linux + windows) on proxmox for some time now. All of them are kvm machines. Afer upgrading the system (vserver), all machines went up, however the ones with Debian lenny had soft lockup issues after some time ( several minutes). There doesn't seem to be a problem on the proxmox host. Other machines (Debian squeeze and Windows) seem to run without problems.

Excerpt from syslog:

...
Sep 15 14:57:50 groupware kernel: [ 9414.142420] Clocksource tsc unstable (delta = -4398041517969 ns)
Sep 15 14:57:50 groupware kernel: [ 9414.142420] BUG: soft lockup - CPU#1 stuck for 246s! [swapper:0]
Sep 15 14:57:50 groupware kernel: [ 9414.142420] Modules linked in: ipv6 loop snd_pcsp virtio_net snd_pcm snd_timer snd psmouse soundcore snd_page_alloc serio_raw i2c_piix4 i2c_core button evdev joydev ext3 jbd mbcache usbhid hid ff_memless virtio_blk floppy piix ide_pci_generic ide_core virtio_pci virtio_ring virtio ata_generic uhci_hcd libata scsi_mod dock thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Sep 15 14:57:50 groupware kernel: [ 9414.142420] CPU 1:
Sep 15 14:57:50 groupware kernel: [ 9414.142420] Modules linked in: ipv6 loop snd_pcsp virtio_net snd_pcm snd_timer snd psmouse soundcore snd_page_alloc serio_raw i2c_piix4 i2c_core button evdev joydev ext3 jbd mbcache usbhid hid ff_memless virtio_blk floppy piix ide_pci_generic ide_core virtio_pci virtio_ring virtio ata_generic uhci_hcd libata scsi_mod dock thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Sep 15 14:57:50 groupware kernel: [ 9414.142420] Pid: 0, comm: swapper Not tainted 2.6.26-2-amd64 #1
Sep 15 14:57:50 groupware kernel: [ 9414.142420] RIP: 0010:[<ffffffff8021eb64>] [<ffffffff8021eb64>] native_safe_halt+0x2/0x3
Sep 15 14:57:50 groupware kernel: [ 9414.142420] RSP: 0018:ffff81007fb9ff38 EFLAGS: 00000246
Sep 15 14:57:50 groupware kernel: [ 9414.142420] RAX: ffff81007fb9ffd8 RBX: 0000000000000000 RCX: 0000000000000000
Sep 15 14:57:50 groupware kernel: [ 9414.142420] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff804fce70
Sep 15 14:57:50 groupware kernel: [ 9414.142420] RBP: 0000000000144448 R08: ffffffff8021eb64 R09: 0000000000000000
Sep 15 14:57:50 groupware kernel: [ 9414.142420] R10: ffff81007e15bdc8 R11: ffff81007fb9fef8 R12: ffff81007fb9fed8
Sep 15 14:57:50 groupware kernel: [ 9414.142420] R13: 0000000000000000 R14: ffffffff8023cd72 R15: 0000000000000000
Sep 15 14:57:50 groupware kernel: [ 9414.142420] FS: 0000000000000000(0000) GS:ffff81007fb719c0(0000) knlGS:0000000000000000
Sep 15 14:57:50 groupware kernel: [ 9414.142420] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Sep 15 14:57:50 groupware kernel: [ 9414.142420] CR2: 00007f61feb9d56c CR3: 000000007e44d000 CR4: 00000000000006e0
Sep 15 14:57:50 groupware kernel: [ 9414.142420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 14:57:50 groupware kernel: [ 9414.142420] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 15 14:57:50 groupware kernel: [ 9414.142420]
Sep 15 14:57:50 groupware kernel: [ 9414.142420] Call Trace:
Sep 15 14:57:50 groupware kernel: [ 9414.142420] [<ffffffff8020b0d8>] ? default_idle+0x2a/0x46
Sep 15 14:57:50 groupware kernel: [ 9414.142420] [<ffffffff8020ad04>] ? cpu_idle+0x8e/0xb8
Sep 15 14:57:50 groupware kernel: [ 9414.142420]
...
Sep 15 15:21:50 groupware kernel: [11527.482897] BUG: soft lockup - CPU#0 stuck for 183s! [swapper:0]
Sep 15 15:21:50 groupware kernel: [11527.486467] Modules linked in: ipv6 loop snd_pcsp virtio_net snd_pcm snd_timer snd psmouse soundcore snd_page_alloc serio_raw i2c_piix4 i2c_core button evdev joydev ext3 jbd mbcache usbhid hid ff_memless virtio_blk floppy piix ide_pci_generic ide_core virtio_pci virtio_ring virtio ata_generic uhci_hcd libata scsi_mod dock thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Sep 15 15:21:50 groupware kernel: [11527.486467] CPU 0:
Sep 15 15:21:50 groupware kernel: [11527.486467] Modules linked in: ipv6 loop snd_pcsp virtio_net snd_pcm snd_timer snd psmouse soundcore snd_page_alloc serio_raw i2c_piix4 i2c_core button evdev joydev ext3 jbd mbcache usbhid hid ff_memless virtio_blk floppy piix ide_pci_generic ide_core virtio_pci virtio_ring virtio ata_generic uhci_hcd libata scsi_mod dock thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Sep 15 15:21:50 groupware kernel: [11527.486467] Pid: 0, comm: swapper Not tainted 2.6.26-2-amd64 #1
Sep 15 15:21:50 groupware kernel: [11527.486467] RIP: 0010:[<ffffffff8023c9ec>] [<ffffffff8023c9ec>] run_timer_softirq+0x155/0x1e2
Sep 15 15:21:50 groupware kernel: [11527.486467] RSP: 0018:ffffffff805e4ef0 EFLAGS: 00000206
Sep 15 15:21:50 groupware kernel: [11527.486467] RAX: ffffffff805e4ef0 RBX: ffff810067c89e88 RCX: ffffffff8023cc34
Sep 15 15:21:50 groupware kernel: [11527.486467] RDX: ffffffff805e4ef0 RSI: 0000000000001e39 RDI: ffff810067c89e88
Sep 15 15:21:50 groupware kernel: [11527.486467] RBP: ffffffff805e4e70 R08: ffff810067c89ec0 R09: 0000000000000000
Sep 15 15:21:50 groupware kernel: [11527.486467] R10: 0000000000000009 R11: ffffffffa008d33f R12: ffffffff8020cd02
Sep 15 15:21:50 groupware kernel: [11527.486467] R13: ffffffff805e4e70 R14: ffff81006610a140 R15: 0000000000000286
Sep 15 15:21:50 groupware kernel: [11527.486467] FS: 0000000000000000(0000) GS:ffffffff8053d000(0000) knlGS:0000000000000000
Sep 15 15:21:50 groupware kernel: [11527.486467] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Sep 15 15:21:50 groupware kernel: [11527.486467] CR2: 00007feb57d0bde0 CR3: 000000007e107000 CR4: 00000000000006e0
Sep 15 15:21:50 groupware kernel: [11527.486467] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 15:21:50 groupware kernel: [11527.486467] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 15 15:21:50 groupware kernel: [11527.486467]
Sep 15 15:21:50 groupware kernel: [11527.486467] Call Trace:
Sep 15 15:21:50 groupware kernel: [11527.486467] <IRQ> [<ffffffff802393cd>] ? __do_softirq+0x5c/0xd1
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020d2dc>] ? call_softirq+0x1c/0x28
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020f3e8>] ? do_softirq+0x3c/0x81
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8023932b>] ? irq_exit+0x3f/0x85
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8021aaab>] ? smp_apic_timer_interrupt+0x8c/0xa4
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020b0ae>] ? default_idle+0x0/0x46
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020cd02>] ? apic_timer_interrupt+0x72/0x80
Sep 15 15:21:50 groupware kernel: [11527.486467] <EOI> [<ffffffff8021eb64>] ? native_safe_halt+0x2/0x3
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8021eb64>] ? native_safe_halt+0x2/0x3
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020b0d8>] ? default_idle+0x2a/0x46
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020ad04>] ? cpu_idle+0x8e/0xb8
Sep 15 15:21:50 groupware kernel: [11527.486467]
...

The kernel on this machine is a vanilla debian kernel.
groupware:~# uname -a
Linux groupware 2.6.26-2-amd64 #1 SMP Mon Jun 13 16:29:33 UTC 2011 x86_64 GNU/Linux

The machine is not usable, since these problems occur very frequently. Is has a some network activity, but not much load (nagios, ldap, apache).

The proxmox system is a Debian lenny+proxmox sources...

vserver:~# pveversion -v
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-43
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-43
qemu-server: 1.1-32
pve-firmware: 1.0-13
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.28-1pve5
vzdump: 1.2-15
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6

It had the new pve kernel running at that time, the excerpt from the dmesg log:

vserver:~# cat /var/log/dmesg.0
...
total RAM covered: 12287M
...
gran_size: 128K chunk_size: 256M num_reg: 8 lose cover RAM: 0G
*BAD*gran_size: 128K chunk_size: 512M num_reg: 8 lose cover RAM: -256M
gran_size: 128K chunk_size: 1G num_reg: 8 lose cover RAM: 0G
*BAD*gran_size: 128K chunk_size: 2G num_reg: 8 lose cover RAM: -1G
gran_size: 256K chunk_size: 256K num_reg: 8 lose cover RAM: 9178624K
...
gran_size: 256K chunk_size: 256M num_reg: 8 lose cover RAM: 0G
*BAD*gran_size: 256K chunk_size: 512M num_reg: 8 lose cover RAM: -256M
gran_size: 256K chunk_size: 1G num_reg: 8 lose cover RAM: 0G
...

This error occurs serveral times for the new kernel. I restarted vserver with the old 1.8 kernel (2.6.32-4-pve) and had neither this errors nor a problem with any of the virtual machines.

I doubt, that this is a proxmox issue, but generally one with my hardware/(kvm?)kernel combination.

Proxmox runs on an Intel Xeon E5310 @ 1.60GHz and an Intel SE7320VP2 Server Board, 12GB RAM

Regards, C. Doering

tom · Sep 15, 2011

do you use virtio? if yes, upgrade your kernel in your lenny guest to 2.6.32.

resoli · Sep 29, 2011

tom said:
do you use virtio? if yes, upgrade your kernel in your lenny guest to 2.6.32.

More ore less same problem here; seems solved after upgrading guest lenny kernel with 2.6.32-bpo.5-amd64 from lenny-backports.

bye,
rob

atran · Oct 10, 2011

i also have this problem when using IDE/e1000. also i'm using 2.6.33 in guest.
any ideas?

pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-47
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-47
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-2pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6

ps. I had this problem earlier when using virtio and proxmox kernel 2.6.35 (all with same guest)

dietmar · Oct 11, 2011

Please can you test the latest kernel:

ftp://download.proxmox.com/debian/d...4/pve-kernel-2.6.32-6-pve_2.6.32-48_amd64.deb

atran · Oct 11, 2011

what are the steps?
"dpkg -i proxmox-ve-2.6.32_1.9-48_all.deb"
without update-grub, then reboot?

dietmar · Oct 11, 2011

Oh sorry, I posted the wrong link. Please use:

# wget ftp://download.proxmox.com/debian/d...4/pve-kernel-2.6.32-6-pve_2.6.32-48_amd64.deb
# dpkg -i pve-kernel-2.6.32-6-pve_2.6.32-48_amd64.deb

atran · Oct 11, 2011

i allready installed the other one !? it's OK?

dietmar · Oct 11, 2011

atran said:
i allready installed the other one !? it's OK?

just check what kernel package you have now:

# dpkg -l pve-kernel-2.6.32-6-pve

atran · Oct 11, 2011

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name Version Description
+++-=========================-=========================-==================================================================
ii pve-kernel-2.6.32-6-pve 2.6.32-48 The Proxmox PVE Kernel Image

OK, so now just reboot right?

dietmar · Oct 11, 2011

atran said:
OK, so now just reboot right?

yes

copymaster · Oct 13, 2011

Hello!

Here the same strange errors on a UBUNTU 9.10 with kernel 2.6.31-19-generic

An Update to the kernel as seen above does not fix the problems.

I have a cluster with 5 physical nodes and i did the update from 1.8 to 1.9 on all of them.

Now i just installed the old kernel 2.6.32-4-pve on the 5th node

is that a problem??

Is there a fix for this strange issue???

dietmar · Oct 13, 2011

copymaster said:
Is there a fix for this strange issue???

Maybe an kernel update inside the VM helps.

atran · Oct 13, 2011

looks like proxmox kernel 2.6.32-48 works for me (Debian K-2.6.33), i'll let it run for a day to be sure.

but now i have problem with KSM in 2.6.32-48:

"watch cat /sys/kernel/mm/ksm/pages_sharing" gives 0 ??? (even after 12 hours)

atran · Oct 13, 2011

looks like proxmox kernel 2.6.32-48 works for me (Debian K-2.6.33), i'll let it run for a day to be sure.

kernel 2.6.32-48 doesn't fix the problem!
now i have 2 problems...

Search

Search

strange kernel error and vm soft lockup after upgrade to 1.9

cdoering

New Member

tom

Proxmox Staff Member

resoli

Renowned Member

atran

Member

dietmar

Proxmox Staff Member

atran

Member

dietmar

Proxmox Staff Member

atran

Member

dietmar

Proxmox Staff Member

atran

Member

dietmar

Proxmox Staff Member

copymaster

Member

dietmar

Proxmox Staff Member

atran

Member

atran

Member