Kernel panic on 2.6.32-28 + 2.6.32-29

udo

Distinguished Member
Apr 22, 2009
5,977
199
163
Ahrensburg; Germany
Hi,
we had last week two kernel panics on one clusternode. This happens shortly after an reboot of the server - before we running normal updates.
After the first kernel-panic I update again to 2.6.32-29.
But the next kernel panic occur some hours later and we move all VMs to the next node (2.6.32-28).
Memory checks on the first node find no problem till now, but yesterday evening the second node crash.
So it's looks for me not for an node-problem - more an problem with some stuff from one of the later updates...

Due to the crash there are no infos in syslog or messages. I have an screenshot of the last panic-output only:
proxmox5.png
We switch on the second node to kernel 2.6.32-27.

Our pve-version on this server is
Code:
proxmox-ve-2.6.32: 3.2-124 (running kernel: 2.6.32-27-pve)
pve-manager: 3.2-2 (running version: 3.2-2/82599a65)
pve-kernel-2.6.32-27-pve: 2.6.32-121
pve-kernel-2.6.32-28-pve: 2.6.32-124
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-6
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1
Any hint??

Udo
 
Please can you test with latest kernel ? (pve-kernel-2.6.32-30-pve_2.6.32-130_amd64.deb)

Hi Dietmar,
I will try the latest kernel.
Is there any changes with the new kernel which point to "this" issue, or is it more hope related?

I guess due to the missing start of the kernel-panic there is no real knowledge what the panic triggered, or?

Udo
 
Hi,
I have had same problem with 2.6.32-28, 2 month ago, on 3 servers. Don't have test upper than -28, I have rollback to -27. (I'll migrate to 3.10 kernel soon).

If I remember, this was the same error message.
 
Its great to have this forum.

I too have had 2 occurrences this past week on 2 hosts.

Updated last week, all 4 hosts had up time in excess of 90 days before the upgrade.
 
Hi,
I have had same problem with 2.6.32-28, 2 month ago, on 3 servers. Don't have test upper than -28, I have rollback to -27. (I'll migrate to 3.10 kernel soon).

If I remember, this was the same error message.
Hi Spirit,
do you have migrate to 3.10 successfully, or do you use 2.6.32-32-pve (or still -27) with pve3.3?

Udo
 
Hi Spirit,
do you have migrate to 3.10 successfully, or do you use 2.6.32-32-pve (or still -27) with pve3.3?

Udo

I'm running 3.10 kernel in production, on all servers since around 3 months now.
No stability problem.
(I'm using dell poweredge servers, xeon and opteron, with bnx2 et e1000 network cards)
 
I too have seen this over the past few months. But I'm thinking maybe the nodes are out of time sync. I only say this because I had a similar issue with my san cluster of 8 nodes that were all synced to an external time server. Time would get off and nodes would fall offline (giving some type of time sync error). Since changing them over to a local server on our network they have all been staying online with no issue. It would take a couple of reboots on Proxmox and finally it would sync up quorum and be good to go. So far my prox nodes are ok.. but will make the adjustment to the local server on next maintenance cycle.

Anyway thinking back this all started about the same time, the san would just go offline, but maybe proxmox is more sensitive (as in kernel panic)