Kernel panic with 2.6.32-6 and multi-cpu OpenVZ

gkovacs

Renowned Member
Dec 22, 2008
516
51
93
Budapest, Hungary
Last wek we have updated to the latest PVE 1.9 with kernel 2.6.32-6. After the upgrade all of our VE's performed very poorly, until we discovered that OpenVZ now respects the max CPUs flag, so basically the guests were running with 1 processor only each. It was very slow but stable for 2 days, so we downgraded back to 2.6.32-4.

After making this discovery, last night we have set the CPU flags for each VE accordingly (between 1 and 4), and rebooted to 2.6.32-6.
The server stopped after 5 hours with a kernel panic. The mysqld process that caused the lockup was running in a VE with 4 CPUs.

panic.jpg

The machine (Intel Q6600 CPU, P45 board, 8GB DDR2 RAM, Adaptec RAID) never had any stability problems before, was running perfectly without reboot for months.
Now we are back to 2.6.32-4 for good.
 
Last edited:
pls include the detailed version (post 'pveversion -v') and tell all details about your hardware.
 
pveversion -v (now running 2.6.32-4-pve for stability)
Code:
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.9-43
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-43
qemu-server: 1.1-32
pve-firmware: 1.0-13
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.28-1pve5
vzdump: 1.2-15
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6

lspci

Code:
00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03)
00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
00:1c.3 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 4
00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 5
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIB (ICH10) LPC Interface Controller
00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
01:00.0 VGA compatible controller: nVidia Corporation G86 [GeForce 8400 GS] (rev a1)
03:00.0 IDE interface: JMicron Technology Corp. JMB368 IDE controller
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
05:00.0 RAID bus controller: Adaptec AAC-RAID (rev 01)

/proc/cpuinfo (x4 of course)
Code:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
stepping        : 11
cpu MHz         : 2399.951
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority
bogomips        : 4799.90
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:
 
Last edited:
test the latest, released to stable today.

pve-kernel-2.6.32-6-pve: 2.6.32-47
 
We have upgraded to the latest (47) kernel last night. During VZDump backups the same server froze again, with a similar message indicating the mysqld process as the reason.

panic2.jpg

Here is the pveversion (after rebooting to the stable 2.6.32-4):
Code:
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.9-47
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-47
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-2pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6
 
Last edited:
our issue with kernel 2.6.32-6-47 looks similar - meanwhile we are able to reproduce the freeze by running
phoronix-test-suite benchmark build-linux-kernel - it never finishes, but freezes. Back on 2.6.32-5 it runs ok - and so did all other tests we tried so far, like memory etc.; it's always the same, 2.6.32-6-47 freezes seemingly when I/O is high, on 2.6.32-5 no issue is noticable...
 
your run "phoronix-test-suite benchmark build-linux-kernel" on the host?

I used this heavily the last days and weeks but no issues here.
 
your run "phoronix-test-suite benchmark build-linux-kernel" on the host?

yes
I used this heavily the last days and weeks but no issues here.

me, too ;-) - no issues with 2.6.32-5 here.
and to be precise: we also have another host running 2.6.32-6-47, which does not have any issues so far. we just have not been able to find out what's wrong with this pilot server and 2.6.32.6...
 
We also have another host that did not freeze with 2.6.32-6-47. That one has 2 NICs and a Q9300 processor. The server that DID freeze has 1 NIC and a Q6600 cpu. Both have AACRAID.
It's also important that the kernel panic only occurs for us if any OpenVZ containers are configured to use more than one CPU.

You might want to post the output of lspci and cat /proc/cpuinfo here, like I did above.
Probably we will find some common hardware elements.
 
Last edited:
We also have another host that did not freeze with 2.6.32-6-47. That one has 2 NICs and a Q9300 processor. The server that DID freeze has 1 NIC and a Q6600 cpu. Both have AACRAID.
It's also important that the kernel panic only occurs for us if any OpenVZ containers are configured to use more than one CPU.

we did a run with both OpenVZ and qemu-server stopped - no containers were running (freeze after approx. 15 seconds of bench run)
You might want to post the output of lspci and cat /proc/cpuinfo here, like I did above.
Probably we will find some common hardware elements.

not sure if this helps, because we did not find any "abnormality", but here it is:

Code:
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 13)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 13)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 13)
00:13.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller (rev 13)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 13)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13)
00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 13)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 13)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 1
00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Port 5
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller
01:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
02:00.0 IDE interface: JMicron Technology Corp. JMB368 IDE controller
04:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
04:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
09:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

Code:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz
stepping        : 5
cpu MHz         : 2266.328
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs  bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips        : 4532.65
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
 
So what is supposed to happen regarding this serious kernel bug? Because so far it does not seem to go away with updates, it's present in every 2.6.32-6 version.

Will the proxmox team handle the investigation from here?
Or should we file a bugreport somewhere on the OpenVZ site?
 
we do it asap, tomorrow.
 
The problems persist with new patch

apt-cache showpkg pve-headers-2.6.32-6-pve
Package: pve-headers-2.6.32-6-pve
Versions:
2.6.32-55 (/var/lib/apt/lists/download.proxmox.com_debian_dists_lenny_pve_binary-amd64_Packages) (/var/lib/dpkg/status)
Description Language:
File: /var/lib/apt/lists/download.proxmox.com_debian_dists_lenny_pve_binary-amd64_Packages
MD5: 4841560c5420d39041d5e38a7f81c94e

I have same crash on my production machine, i posted in this thread http://forum.proxmox.com/threads/8041-Proxmox-Crashing my server data.
 
See this pictures

left in graph kernel with boot 2.6.32-6, right kernel 2.6.32-4, same machine, same conditions

cpuday.jpg

And this graph all time with old rls, patch and old kernel. You can see when my server crash, first time one month, see white bar and second with patch only one day up, see big change.

cpubeforeafter.jpg