After a long stabil period, the Proxmox VE environment suddenly get a kernel panic when trying to restore.
pveversion -v
1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.18-6-pve
proxmox-ve-2.6.18: 1.8-15
pve-kernel-2.6.18-4-pve: 2.6.18-10
pve-kernel-2.6.18-6-pve: 2.6.18-15
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm-2.6.18: 0.9.1-15
I thougth this was caused by an hardware error, since I get messages like this:
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 0 TSC 8999881b5cc
MISC 435200040081
MCG status:
MCi status:
Error overflow
MCi_MISC register valid
MCi_ADDR register valid
MCA: Unknown Error 9f
STATUS cc0007800001009f MCGSTATUS 0
MCG status:
MCi status:
Error overflow
MCi_MISC register valid
MCi_ADDR register valid
MCA: Unknown Error 9f
Now I am not sure, since I read in another forum that kernel panic might be caused by different processors accessing the same area of memory. Running top allways indicate that the kernel panic occure under similar circumstances.
top - 22:22:41 up 1 day, 22:47, 1 user, load average: 2.41, 1.90, 0.89
Tasks: 144 total, 1 running, 143 sleeping, 0 stopped, 0 zombie
Cpu(s): 10.6%us, 3.6%sy, 0.0%ni, 51.3%id, 32.8%wa, 0.1%hi, 1.7%si, 0.0%st
Mem: 6083208k total, 6047960k used, 35248k free, 8188k buffers
Swap: 5242872k total, 2764k used, 5240108k free, 3806484k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31568 root 18 0 4152 568 388 D 44 0.0 1:32.21 gzip
31570 root 18 0 3648 356 284 S 7 0.0 0:22.47 sparsecp
31567 root 18 0 18432 2524 808 S 3 0.0 0:11.54 tar
327 root 15 0 0 0 0 S 1 0.0 1:31.23 pdflush
328 root 10 -5 0 0 0 S 1 0.0 1:51.40 kswapd0
8750 root 10 -5 0 0 0 S 0 0.0 0:31.78 iscsi_q_9
10817 root 15 0 1177m 170m 1448 S 0 2.9 2:32.45 kvm
10907 root 18 0 4286m 1.6g 1524 S 0 28.1 30:00.10 kvm
Everything starts OK:
INFO: restore QemuServer backup 'vzdump-qemu-203-2011_11_16-23_17_52.tgz' using ID 200
INFO: extracting 'qemu-server.conf' from archive
INFO: extracting 'vm-disk-ide0.raw' from archive
INFO: Formatting '/var/lib/vz/images/200/vm-200-disk-1.raw', fmt=raw, size=32 kB
INFO: new volume ID is 'local:200/vm-200-disk-1.raw'
INFO: restore data to '/var/lib/vz/images/200/vm-200-disk-1.raw' (137438953472 bytes)
After some time the kernel panic occure, and the system crash. Since I have had similar episodes earlier with different kernel versions, I wonder if there could be some problem related to either the pve firmware or the kernel version we run.
The server is a Supermicro:
product: X8SIE
vendor: Supermicro
physical id: 0
Running 4 of these:
description: CPU
product: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz
vendor: Intel Corp.
physical id: 4
bus info: cpu@0
version: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz
serial: To Be Filled By O.E.M.
slot: CPU
size: 2400MHz
capacity: 2400MHz
width: 64 bits
pveversion -v
1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.18-6-pve
proxmox-ve-2.6.18: 1.8-15
pve-kernel-2.6.18-4-pve: 2.6.18-10
pve-kernel-2.6.18-6-pve: 2.6.18-15
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm-2.6.18: 0.9.1-15
I thougth this was caused by an hardware error, since I get messages like this:
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 0 TSC 8999881b5cc
MISC 435200040081
MCG status:
MCi status:
Error overflow
MCi_MISC register valid
MCi_ADDR register valid
MCA: Unknown Error 9f
STATUS cc0007800001009f MCGSTATUS 0
MCG status:
MCi status:
Error overflow
MCi_MISC register valid
MCi_ADDR register valid
MCA: Unknown Error 9f
Now I am not sure, since I read in another forum that kernel panic might be caused by different processors accessing the same area of memory. Running top allways indicate that the kernel panic occure under similar circumstances.
top - 22:22:41 up 1 day, 22:47, 1 user, load average: 2.41, 1.90, 0.89
Tasks: 144 total, 1 running, 143 sleeping, 0 stopped, 0 zombie
Cpu(s): 10.6%us, 3.6%sy, 0.0%ni, 51.3%id, 32.8%wa, 0.1%hi, 1.7%si, 0.0%st
Mem: 6083208k total, 6047960k used, 35248k free, 8188k buffers
Swap: 5242872k total, 2764k used, 5240108k free, 3806484k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31568 root 18 0 4152 568 388 D 44 0.0 1:32.21 gzip
31570 root 18 0 3648 356 284 S 7 0.0 0:22.47 sparsecp
31567 root 18 0 18432 2524 808 S 3 0.0 0:11.54 tar
327 root 15 0 0 0 0 S 1 0.0 1:31.23 pdflush
328 root 10 -5 0 0 0 S 1 0.0 1:51.40 kswapd0
8750 root 10 -5 0 0 0 S 0 0.0 0:31.78 iscsi_q_9
10817 root 15 0 1177m 170m 1448 S 0 2.9 2:32.45 kvm
10907 root 18 0 4286m 1.6g 1524 S 0 28.1 30:00.10 kvm
Everything starts OK:
INFO: restore QemuServer backup 'vzdump-qemu-203-2011_11_16-23_17_52.tgz' using ID 200
INFO: extracting 'qemu-server.conf' from archive
INFO: extracting 'vm-disk-ide0.raw' from archive
INFO: Formatting '/var/lib/vz/images/200/vm-200-disk-1.raw', fmt=raw, size=32 kB
INFO: new volume ID is 'local:200/vm-200-disk-1.raw'
INFO: restore data to '/var/lib/vz/images/200/vm-200-disk-1.raw' (137438953472 bytes)
After some time the kernel panic occure, and the system crash. Since I have had similar episodes earlier with different kernel versions, I wonder if there could be some problem related to either the pve firmware or the kernel version we run.
The server is a Supermicro:
product: X8SIE
vendor: Supermicro
physical id: 0
Running 4 of these:
description: CPU
product: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz
vendor: Intel Corp.
physical id: 4
bus info: cpu@0
version: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz
serial: To Be Filled By O.E.M.
slot: CPU
size: 2400MHz
capacity: 2400MHz
width: 64 bits
Last edited: