Kernel bug in pve-kernel 2.6.24-7 (soft lockup)

andio

Member
Aug 28, 2009
18
0
21
I just installed Lenny with kernel 2.6.26-2 (amd64) on IBM x3850 (4 CPU DualCore Xeon 7110 with VT + Hyperthreading) a few minutes ago. It's a fresh and clean install with nothing else running.

Vanilla Kernel 2.6.26-2 works fine and lenny's included openvz-kernel works fine too (i used that one before on this server).

I installed pve-kernel 2.6.24-7-pve (2.6.24-11) from proxmox.com Repository and did a reboot.

The server needed 30 minutes just to boot.

When i finally managed to log in from remote, i noticed that the machine is absolutely slow.
A "ps -ef" needs more then 10 seconds to display current proccesses running, display output very slow line by line ..
A simple "apt-get install less" command needs 5 minutes to install 'less'. The machine seems to be absolutely overloaded and is unuseable.



I googled around and found some hints on a kernel bug which seems to be fixed in 2.6.26 or later. But i'm not sure! The only thing i'm sure is that i definitely will not be able to use that pve-kernel on the IBM x3850 servers......

Any ideas?
I'd be glad to support finding that bug in pve-kernel. I've got several of these servers i just bought to install proxmox. I could immediately "lend" one server with remote access for testing and fixing.


--
Update 1: I tested the other pve.old- and pvetest-kernels from proxmox-Repository as well, all with the same result :-(

Update 2: what i further found out is that it seems to be a bug in kernel 2.6.24 especially in Ubuntu. Do you use Ubuntu to compile pve-kernels?

Update 3: tried to fiddle around with clocksource=.. kernel paramaters but without success (tried all available options acpi_pm jiffies tsc notsc... without any difference), i tried acpi=off and some other tricks without success

Update 4: similiar situations described here: http://forum.openvz.org/index.php?t=msg&goto=30122 - maybe kernel and/or hardware problem, at least kernel 2.6.25 does have a workaround (Quote: "Kernel 2.6.25 includes a workaround which detects this issue and uses only so much RAM as possible without a slowdown."). Will there be a 2.6.25+ pvekernel soon???

For proxmox staff i put dmesg online from plain vanilla debian 5 lenny kernel (2.6.26) and from the pvekernel which makes the server unuseable slow (both with printk.time=1 to compare -- you see the difference very clearly)

http://www.andreasotto.net/dmesg-vanillakernel.txt runs like a charm, fast and stable - time for booting 38 seconds

http://www.andreasotto.net/dmesg-pvekernel.txt terrible slow - time for booting: 25 minutes!!

Hopefully there can be done something..
 
Last edited:
I just installed Lenny with kernel 2.6.26-2 (amd64) on IBM x3850 (4 CPU DualCore Xeon 7110 with VT + Hyperthreading) a few minutes ago. It's a fresh and clean install with nothing else running.

Vanilla Kernel 2.6.26-2 works fine and lenny's included openvz-kernel works fine too (i used that one before on this server).

I installed pve-kernel 2.6.24-7-pve (2.6.24-11) from proxmox.com Repository and did a reboot.

The server needed 30 minutes just to boot.

When i finally managed to log in from remote, i noticed that the machine is absolutely slow.
A "ps -ef" needs more then 10 seconds to display current proccesses running, display output very slow line by line ..
A simple "apt-get install less" command needs 5 minutes to install 'less'. The machine seems to be absolutely overloaded and is unuseable.



I googled around and found some hints on a kernel bug which seems to be fixed in 2.6.26 or later. But i'm not sure! The only thing i'm sure is that i definitely will not be able to use that pve-kernel on the IBM x3850 servers......

Any ideas?
I'd be glad to support finding that bug in pve-kernel. I've got several of these servers i just bought to install proxmox. I could immediately "lend" one server with remote access for testing and fixing.


--
Update 1: I tested the other pve.old- and pvetest-kernels from proxmox-Repository as well, all with the same result :-(

Update 2: what i further found out is that it seems to be a bug in kernel 2.6.24 especially in Ubuntu. Do you use Ubuntu to compile pve-kernels?

Update 3: tried to fiddle around with clocksource=.. kernel paramaters but without success (tried all available options acpi_pm jiffies tsc notsc... without any difference), i tried acpi=off and some other tricks without success

Update 4: similiar situations described here: http://forum.openvz.org/index.php?t=msg&goto=30122 - maybe kernel and/or hardware problem, at least kernel 2.6.25 does have a workaround (Quote: "Kernel 2.6.25 includes a workaround which detects this issue and uses only so much RAM as possible without a slowdown."). Will there be a 2.6.25+ pvekernel soon???

For proxmox staff i put dmesg online from plain vanilla debian 5 lenny kernel (2.6.26) and from the pvekernel which makes the server unuseable slow (both with printk.time=1 to compare -- you see the difference very clearly)

http://www.andreasotto.net/dmesg-vanillakernel.txt runs like a charm, fast and stable - time for booting 38 seconds

http://www.andreasotto.net/dmesg-pvekernel.txt terrible slow - time for booting: 25 minutes!!

Hopefully there can be done something..

Looks like you need a new Kernel. Please contact office at proxmox.com to get an offer for this.
 
I have the same problem, but not on an IBM server.

cpu: Intel Xeon Quad, 4x 2.83+ GHz 12 Mo L2 - FSB 1333 MHz
ram: 8Go DDR2

Debian Lenny 5.0 / Proxmox 1.3 with kernel: pve-kernel-2.6.24-7-pve (pve-kernel-2.6.24-7-pve_2.6.24-11_amd64.deb from 21 august).

I have random server lockups under network or cpu load. And I can't upgrade my bios since it is an hosted server that I rent. And it crashes randomly at boot too.

now it works but my "dmesg" has plenty of:

BUG: soft lockup - CPU#2 stuck for 11s! [kstopmachine:3731]
CPU 2:
Modules linked in: ata_generic pata_acpi psmouse uhci_hcd ehci_hcd pata_marvell serio_raw pcspkr usbcore e1000e evdev video output button dm_snapshot thermal processor fan sata_nv via686a ahci mptctl mptsas scsi_transport_sas mptspi mptscsih mptbase dm_crypt raid456 async_xor async_memcpy async_tx xor raid0 raid1 md_mod dm_mirror dm_mod sata_via ata_piix sata_sis pata_sis libata sym53c8xx megaraid aic7xxx scsi_transport_spi sd_mod 3w_xxxx scsi_mod atl1 sky2 skge r8169 e1000 via_rhine sis900 8139too e100 mii
Pid: 3731, comm: kstopmachine Not tainted 2.6.24-7-pve #1 ovz005
RIP: 0010:[<ffffffff8028932c>] [<ffffffff8028932c>] stopmachine+0x4c/0x110
RSP: 0018:ffff810232195f40 EFLAGS: 00000202
RAX: 0000000000000001 RBX: 0000000000000e92 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
RBP: ffffffff804c7c30 R08: ffff810232194000 R09: 000000000007a574
R10: 0000000000000009 R11: ffffffff80423c30 R12: 0000000000000004
R13: ffffffff8024d9ca R14: 0000000000000e92 R15: ffff810232195ec0
FS: 0000000000000000(0000) GS:ffff810232c02c00(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f2ba0e68000 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
[<ffffffff8023ca82>] schedule_tail+0x22/0x70
[<ffffffff8020d4e8>] child_rip+0xa/0x12
[<ffffffff802892e0>] stopmachine+0x0/0x110
[<ffffffff8020d4de>] child_rip+0x0/0x12




See: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/210672
it seems to be resolved for them
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!