Server freezes ot reboots with pve-2.6.32-6 kernels

Kalin Bogatzevski

New Member
Nov 23, 2011
10
0
1
Server freezes or reboots with pve-2.6.32-6 kernels

One of our Proxmox based servers has the following problems depending on the kernel version:-with 2.6.32-6 (50) it reboots (for the last 3 days - 4 times) - nothing of interest in the logs-with other 2.6.32-6 versions (48,52) it freezes, but not completely - there is a ping to each IP addresses of all containers and the host, but nothing else works. Screen is blank. Nothing into logs. Hardware reboot only possible (reset).pveperf:CPU BOGOMIPS: 57597.34REGEX/SECOND: 890922HD SIZE: 5.50 GB (/dev/sda1)BUFFERED READS: 2.00 MB/secAVERAGE SEEK TIME: 68.04 msFSYNCS/SECOND: 42.88DNS EXT: 11.01 msDNS INT: 1.08 ms (bul.net)pveversion:pve-manager: 2.0-10 (pve-manager/2.0/7a10f3e6)running kernel: 2.6.32-6-pveproxmox-ve-2.6.32: 2.0-52pve-kernel-2.6.32-4-pve: 2.6.32-33pve-kernel-2.6.32-6-pve: 2.6.32-50lvm2: 2.02.86-1pve1clvm: 2.02.86-1pve1corosync-pve: 1.4.1-1openais-pve: 1.1.4-1libqb: 0.6.0-1redhat-cluster-pve: 3.1.7-1pve-cluster: 1.0-11qemu-server: 2.0-4pve-firmware: 1.0-14libpve-common-perl: 1.0-7libpve-access-control: 1.0-2libpve-storage-perl: 2.0-6vncterm: 1.0-2vzctl: 3.0.29-3pve3vzprocps: 2.0.11-2vzquota: 3.0.12-3pve-qemu-kvm: 0.15.0-2ksm-control-daemon: 1.1-1Funny or not, the server stopped again while I was writing this message.Server has 24GB RAM. I can post any additional information requested to help resolve the problem.Thanks.Kalin.
 
Last edited:
Hi Kalin,

I had the same/similar problem on a new server. I found out, that a virtual container (webserver) had not enough RAM to manage peak load which are very rare. I gave only 1024MB RAM and no SWAP, now it has 2048MB and 256MB Swap and the whole server runs now without problems. The syslog with some user beancounter errors gave me the hint.

Best regards,
Patrick
 
Hi Patrick,

Thanks for the reply.

Does this mean that the kernel cannot actually limit the memory used inside a container?
The last freeze did leave some messages in the syslog, and yes they are about memory... but all of the containers have memory + swap configured. I might increase the swap then and test again.

Regards,
Kalin.
 
I returned back to 2.6.32-4 kernel.
For the moment it works, comparing to the 32-6 versions.

I got something like this though:

INFO: task sync:707 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sync D ffff8804c855d000 0 707 2090 0x00000004
ffff88061b916000 0000000000000086 0000000000000000 0000000000000082
ffffffff8101657f 0000000000000082 000000000000fa40 ffff8804c619dfd8
0000000000016940 0000000000016940 ffff8804c855d000 ffff8804c855d2f8
Call Trace:
[<ffffffff8101657f>] ? sched_clock+0x5/0x8
[<ffffffff810b6bca>] ? find_get_pages_tag+0x46/0xdd
[<ffffffff8110ba07>] ? bdi_sched_wait+0x0/0xe
[<ffffffff8110ba10>] ? bdi_sched_wait+0x9/0xe
[<ffffffff81314cb7>] ? __wait_on_bit+0x41/0x70
[<ffffffff8110ba07>] ? bdi_sched_wait+0x0/0xe
[<ffffffff81314d51>] ? out_of_line_wait_on_bit+0x6b/0x77
[<ffffffff81066a44>] ? wake_bit_function+0x0/0x23
[<ffffffff8110ba88>] ? sync_inodes_sb+0x73/0x12a
[<ffffffff8110f718>] ? __sync_filesystem+0x4c/0x72
[<ffffffff8110f7d8>] ? sync_filesystems+0x9a/0xe3
[<ffffffff8110f890>] ? sys_sync+0x46/0x75
[<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
 
I returned back to 2.6.32-4 kernel.
For the moment it works, comparing to the 32-6 versions.
..

there is no 2.6.32-4 for the 2.0 beta, so which version do you run exactly? post pveversion -v.

if your run 1.9, pls test the latest kernel from pvetest (pve-kernel-2.6.32-6-pve: 2.6.32-53).
 
hello tom,

i was using 2.0 beta with 2.6.32-6, and also with 32-4. there's no problem to reboot with the other kernel.
now i got back to 1.9 (because i have another issues with our other vps management applications), but actually this is not the problem.
looks like there's some kernel incompatibility with our new hardware.
i will test 32-53, especially if the NFS problems are fixed as described. but i can't reboot that often this machine now.
of course we want to use 32-6 kernels, as they give me much more functionallities.

Thanks,
 
It's now the 4th day on this machine with the 2.6.32-4 and no problems. Looks like it is a kernel problem after all.
I will reboot now with 2.6.32-6 (53) to test, but will keep -4 as a default kernel on a new reboot.
 
Current report: 24h later the 2.6.32-6 (53) is working stable on this machine. I guess it might be the problem that has been fixed in this kernel version with the Xeon 55/5600 series and C1E states. I couldn't see the current BIOS settings on this, as the server is located at a datacenter.

Hope this stability will continue :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!