New 2.6.32 Kernel with stable OpenVZ (pvetest)

martin

Proxmox Staff Member
Staff member
Apr 28, 2005
754
1,742
223
We just released a new kernel to the pvetest repository. This one includes the long awaited stable OpenVZ patches for 2.6.32 and also support for KSM. Additionally, this kernel will also be used (more or less) for the upcoming 2.0 beta which is expected end of Q3.

Everybody is encouraged to test and give feedback before we move it to the stable repository (and name it Proxmox VE 1.9, also new ISO image).

Release notes:

- pve-kernel-2.6.32 (2.6.32-41)
  • rebase on vzkernel-2.6.32-042stab036.1.src.rpm (new stable Openvz kernel branch), updates for drivers including e1000e to 1.5.1, ARECA RAID driver, megaraid_sas, bnx2, igb to 3.0.22, ixgbe to 3.3.9, drbd 8.3.10, ...
- vzctl (3.0.28-1pve5)
  • update to latest upstream
  • set defaull template to debian-6.0-standard
  • set CONFIGFILE="pve.auto" in /etc/vz/vz.conf
  • merge some fixes from upstream
- pve-manager (1.8-22)
  • fix uptime display for 2.6.32 kernel with 1000HZ
  • support newer vzctl versions.
  • support 'maxfiles' backup option.
- pve-qemu-kvm (0.15.0-1)
  • update to upstream 0.15.0
  • use pxe roms from upstream qemu-kvm
- qemu-server (1.1-31)
  • small fixes for new qemu-kvm 0.15.0
- libpve-storage-perl (1.0-19)
  • set LC_ALL instead of LANG (avoid bug when user sets LC_ environment variables)
  • iscsi: tolerate errors when not all portals are online.
- vzdump (1.2-15)
  • run pre-restart hook after snapshot
__________________
Best regards,
Martin Maurer
 
Last edited by a moderator:
Hi,

upgraded today from kernel 2.6.24 -> 2.6.32.6. I have one openvz machine with two network interfaces that routes from one subnet to a other. In with 2.6.24.12 kernel all works perfect. With 2.6.32.6 the networks response on a ping to the router but can not reach the client behint the router. Back to kernel 2.6.24.12 all works fine.

Regards, Valle
 
Need to have the new .iso image for testing - since all of our new hardware are equipped with the new Areca 18xx series raid-controller we had to move away from Proxmox because of the missing 6gbit SAS drivers....
 
We installed onto a disk connected to the motherboard.
Then compiled the Areca driver, installed it and updated initial ram disk.
Boot up single user mode and used dd to copy the disk to the Areca array of same size and now we can boot from the array with no problem.
Every kernel update since we ensure to compile the driver, install and update initrd before we reboot.

We have been using proxmox since ver 1.5 like this.

Thanks for adding the newer Areca drivers so I no longer need to remember to compile them after updates!
 
upgraded today from kernel 2.6.24 -> 2.6.32.6. I have one openvz machine with two network interfaces that routes from one subnet to a other. In with 2.6.24.12 kernel all works perfect. With 2.6.32.6 the networks response on a ping to the router but can not reach the client behint the router. Back to kernel 2.6.24.12 all works fine.

Difficult to debug without further info. Already tried tcpdump?
 
Hi,
just doing an io-performance test with windows smp-systems. And it's looks good!

My first try looks very bad, but it's seems that was windows-related (fresh boot with 4 cores - windows was doing some weird things (cpu-usages shows a lot activity on an calm system)).
The second try shows good performance.

Here the result - the Values are the index from h2benchw.exe
Code:
kernel cores io-performance
2.6.32  1       98
2.6.35  4        4.4 -> windows-related bad value
2.6.32  4       87
2.6.35  4       87.2
So, it's look that for me the 2.6.32-kernel will be the right choice (but i guess some system of me will be first updated with pve 2.x).
 
This is an option if your server is sitting under your desk or nearby you, but not for 1U servers which are fully populated with disks and spreaded over different datacenters;
You have to unmount the server, install it on a open server where the reduced airflow could possibly overheat one of the components.... that's good reasons to use another virtualization or to wait if the Areca drivers in....
 
Hello,

we just tried the new kernel on our "proxmox cluster pilot", two identical servers running proxmox (pvetest) with DRBD (8.3.7).
Unfortunately, this was the very first time that a proxmox kernel seem not to work at all - after reboot, we experience total system freezes; after the 1st reboot we were able to login, and shortly after - during some typing - the system froze. 2nd reboot then hang before console login, when loading the adaptec (5805) "stormanager".
We replaced the kernel with pve-kernel-2.6.32-5 again - and all works fine as before...

Unfortunately, we did not find any traces (log msgs) - freezes seem to be "too fast"; so i dunno how we could provide more info, besides configuration of course:

/0 bus X8DTN
/0/0 memory 64KiB BIOS
/0/4 processor Intel(R) Xeon(R) CPU L5520 @ 2.27GHz
/0/4/5 memory 256KiB L1 cache
/0/4/6 memory 1MiB L2 cache
/0/4/7 memory 8MiB L3 cache
/0/8 processor Intel(R) Xeon(R) CPU L5520 @ 2.27GHz
/0/8/9 memory 256KiB L1 cache
/0/8/a memory 1MiB L2 cache
/0/8/b memory 8MiB L3 cache
/0/15 memory 48GiB System Memory

/0/100 bridge 5520 I/O Hub to ESI Port
/0/100/1 bridge 5520/5500/X58 I/O Hub PCI Express Root Port 1
/0/100/1/0 eth0 network 82576 Gigabit Network Connection
/0/100/1/0.1 eth1 network 82576 Gigabit Network Connection
/0/100/3 bridge 5520/5500/X58 I/O Hub PCI Express Root Port 3
/0/100/3/0 scsi0 storage AAC-RAID

(pveversion *after* going back to 2.6.32-5 kernel)
pve-manager: 1.8-22 (pve-manager/1.8/6531)
running kernel: 2.6.32-5-pve
proxmox-ve-2.6.32: 1.8-33
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-5-pve: 2.6.32-36
pve-kernel-2.6.18-4-pve: 2.6.18-12
pve-kernel-2.6.24-12-pve: 2.6.24-25
qemu-server: 1.1-31
pve-firmware: 1.0-13
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.28-1pve5
vzdump: 1.2-15
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6
 
give all details about your hardware, I have 2 boxes running here with the adaptec 5805z, no issues - but I do not use DRBD on this setup.

the old kernel got DRBD 8.3.7, the new one got the 8.3.10 kernel module - maybe this causes the issue on your side?
 
give all details about your hardware, I have 2 boxes running here with the adaptec 5805z, no issues - but I do not use DRBD on this setup.

the old kernel got DRBD 8.3.7, the new one got the 8.3.10 kernel module - maybe this causes the issue on your side?

yes, i also had the feeling (nut no evidence at all) that it was related to DRBD, also because of having a (resolvable) split-brain afterwards (though in my experience this is not *that* surprinsing with DRBD ;-).
Regarding the hardware - as already indicated, it's a X8DTN (Supermicro), using two L5520 and a SAS backplane, equipped with a SAS raid-1 (system) and SATA raid-5 (data, DRBD).
The adaptec is a 5805 (latest firmware) - which we use as a standard here without much hassle so far.

What we will try next is to upgrade die BIOS of the X8DTN - we noticed a recent update for that - and then we will retry with 2.6.32-6...

Btw, if the new kernel uses DRBD 8.3.10 - is it advisable to upgrade userland (drbdadm) as well?
 
...

Btw, if the new kernel uses DRBD 8.3.10 - is it advisable to upgrade userland (drbdadm) as well?

according to the drdb userguide, yes. 8.4.0 is already out and recommended but we got not time yet to package it, but probably you just compile by yourself.

our new 2.6.32-6 is based on the stable openvz which is based on rhel6.x.
 
according to the drdb userguide, yes. 8.4.0 is already out and recommended but we got not time yet to package it, but probably you just compile by yourself.

our new 2.6.32-6 is based on the stable openvz which is based on rhel6.x.

Ok, i upgraded drbd8-tools to 8.3.10, and also did the BIOS update. Reboot into 2.6.32-6 - total freeze when first CT is started.

I noticed that the igb driver included i relatively old (and DRBD activity after reboot involves network quite a bit) - and we once had some issues with that and Supermicro; therefore i did an upgrade to latest igb driver, and tried again. I can't say for sure if it's only by chance or a result, but i then was able to boot into 2.6.32-6, and the boot finished (this only happened once before) so that i was able to login.
It took only about a minute, and "freeze was back" - but this time we got some messages on console, and the machine was still pingable for another minute, before it froze.

And it clearly points to the adaptec resp. aacraid:
aacraid: Host adapter abort request (0,0,0,0)
...
AAC: Host adapter BLINK LED 0xef
AAC0: adapter kernel panic'd ef.

when this last message appeared, machine was not longer pingable:
IRQ24/aacraid: IRQF_DISABLED is not guaranteed on shared IRQs

I verified that 2.6.32-5 and 2.6.32.6 use the same aacraid driver, so i'd sort this out as reason; though this looks really strange to me...
Back again in 2.6.32-5 - and machine again is "rock stable"...
And according to lspci, there's no other device on IRQ 24 (ok, at least in 2.6.32-5 - will try to check this in 2.6.32-6 next time when it gives me a few seconds...)

I understand that 2.6.32-5 and 2.6.32-6 are "totally different" because of re-basing on OpenVZ's RHEL6 kernel, so i'd assume there's not much sense trying to find/dig into some sort of "Changelog"... (?)
 
can you plugin the raid card into another physical slot?
 
Does the workaround from this bug report help?
https://bugzilla.redhat.com/show_bug.cgi?id=540478

in short: yes :-)
i was just about to write the "progress made" message; i also found this - obviously non-resolved/non-commented bug msg above, which pointed to ASPM.
And i knew that we once had problems with adaptec and ASPM in the past - we always get our servers pre-configured, means equipped with RAID controller and tuned BIOS, but when we got 2 servers both reporting "adaptec kernel panic" (before booting anything), our supplier first exchanged adaptecs, and then we found out that BIOS had ASPM enabled.
Since then ASPM is always checked to be "off" - and this machine also had this off. However, i have the feeling as if linux kernel is able to re-enable this:

Sep 8 14:12:16 vcluster1a kernel: pci 0000:04:00.2: PME# supported from D0 D3hot D3cold
Sep 8 14:12:16 vcluster1a kernel: pci 0000:04:00.2: PME# disabled
Sep 8 14:12:16 vcluster1a kernel: pci 0000:04:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force'
Sep 8 14:12:16 vcluster1a kernel: pci 0000:02:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force'

for me, this sounds like "generally enabled, but disabled on 2 slots"...

I then added the suggested workaround, and added "pcie_aspm=off" in grub. The boot message now is

Sep 8 15:45:39 vcluster1a kernel: PCIe ASPM is disabled

And since half an hour the machine runs like a charm...

So it looks like as if this is at least a feasible workaround; thanks for taking care!

NB: we never observed this before with "original" RHEL6 or CentOS6 kernels, must be either debian or OpenVZ... ;-)
 
great, thanks for testing the workaround!
 
Difficult to debug without further info. Already tried tcpdump?

sorry dietmar, i will make a tcpdump at the weekend. I am not a "poweruser" in tcpdump. Is there a chance, that i can send the tcpdump file to you?

There are any other information that you need?

Regards, Valle
 
sorry dietmar, i will make a tcpdump at the weekend. I am not a "poweruser" in tcpdump. Is there a chance, that i can send the tcpdump file to you?

I guess a single dump will not help - you need to debug that interactive. Does it work witg 2.6.18?
 
Since then ASPM is always checked to be "off" - and this machine also had this off. However, i have the feeling as if linux kernel is able to re-enable this:

Just uploaded a new kernel which should fix that (should respect BIOS settings now). Also updated igb and ixgbe drivers.

Please can you test?

Update, here are the driver versions, igb and ixgbe:

Code:
modinfo igb

filename:       /lib/modules/2.6.32-6-pve/kernel/drivers/net/igb/igb.ko
version:        3.1.16
license:        GPL
description:    Intel(R) Gigabit Ethernet Network Driver
author:         Intel Corporation, <e1000-devel@lists.sourceforge.net>

Code:
modinfo ixgbe

filename:       /lib/modules/2.6.32-6-pve/kernel/drivers/net/ixgbe/ixgbe.ko
version:        3.4.24-NAPI
license:        GPL
description:    Intel(R) 10 Gigabit PCI Express Network Driver
 
Last edited by a moderator:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!