Proxmox VE with ubuntu-server kernel

linuxdatacenter · Oct 1, 2010

Hi,

I've recently upgraded the kernel on proxmox VE hypervisor to the latest ubuntu version. It was done to solve some problems with multipath. With kernel-2.6.32-2-pve multipath would hang for some time when we were doing reboots of our storage arrays.
We had this sort of messages and stack traces from qlogic driver as well as emulex (fc hba):

Oct 1 10:10:19 s10826 kernel: sd 1:0:1:0: [sdd] CDB: Write(10): 2a 00 0a db 3f d0 00 00 08 00
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): DEVICE RESET ISSUED.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): DEVICE RESET FAILED: Task management failed.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): TARGET RESET ISSUED.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): TARGET RESET FAILED: Task management failed.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): TARGET RESET ISSUED.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): TARGET RESET FAILED: Task management failed.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): BUS RESET ISSUED.
Oct 1 10:10:19 s10826 kernel: sd 1:0:1:0: [sdd] Synchronizing SCSI cache
Oct 1 10:10:25 s10826 kernel: qla2xxx 0000:06:00.1: qla2xxx_eh_bus_reset: reset succeded
Oct 1 10:10:35 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): ADAPTER RESET ISSUED.
Oct 1 10:13:49 s10826 kernel: qla2xxx_1_dpc D 0000000000000000 0 958 2 0x00000000
Oct 1 10:13:49 s10826 kernel: ffff880214701ad0 0000000000000046 ffff880228415c30 0000000000000000

[here goes stack trace]

Upgrading our kernel to latest ubuntu version (2.6.35) cleared the problem. Machines are running fine and we have no more issues with multipath failover. However extensive testing is required before it hits production.
Is anyone running such kinda setup (proxmox ve + ubuntu kernel)?

Ninjix · Oct 1, 2010

Last November I ran with Proxmox on Ubuntu 9.10 for a few weeks of testing. It worked but had performance problems hosting Windows 2003 running SQL 2005. Under high IO load the guest clocks would fall behind and this lead to SQL replication errors. The Proxmox guys do a good job testing and tunning their kernel. I switched back to Proxmox bare metal as soon as they released the 2.6.32 kernel with KSM.

dietmar · Oct 2, 2010

linuxdatacenter said:
I've recently upgraded the kernel on proxmox VE hypervisor to the latest ubuntu version.

Would you mind to test if I update/backport the drivers in our 2.6.32 kernel?

linuxdatacenter · Oct 4, 2010

Hi,

That would be great if you gave us an updated kernel for testing!

The situation I described takes place during upgrades of our storage array (3PAR). We've got two paths from servers to the array. During the upgrade, storage array controllers are reset one after another. When the first controller is reset, everything is fine - the paths are failed over to the second controller. The second controller is rebooted when the first comes online again. After the second controller goes down, we have the error I described, along with stack traces. Virtual machines freeze for about a minute. Sometimes guest filesystems go readonly and they need to be rebooted. This does not happen with ubuntu kernel.

Please note - this problem does not happen when we turn off a port on FC switch (not the controller on the array itself).

If you want us to test new kernels, now is the time, as we have a new shiny 3PAR box right here which has not hit production yet, so it can be rebooted over and over ;-)

macday · Oct 4, 2010

I´m also doing some tests - if I could have the new 2.6.35 kernel...

dietmar · Oct 4, 2010

Seems most people would prefer a 2.6.35 kernel for KVM only systems?

linuxdatacenter · Oct 4, 2010

For me having ksm is most essential.

caracolla · Oct 4, 2010

I've two identical server with 16GB ram each.
9 VMs in a mix of KVM WindowsXP/2000/2003/2008/Ubuntu
In almost idle VMs load
- on the standard PVE (2.6.32-4 kernel without KSM) 11.4 GB is occupied
- on the other server (with linux-image-2.6.35-22-server_2.6.35-22.33_amd64.deb kernel) same configuration and same VMs I get only 3.8GB occupied.
Really impressive

macday · Oct 4, 2010

Wow and what is about the overall perfomance and load with the maverick-kernel ? ...in relation to the standard...

caracolla · Oct 4, 2010

The box with maverick kernel is not tested in production yet.
So far the KSM enabled system seems stable and responsive as the standard one.
In a few days I'll do comparisons under load.
Concerning cpu, this (in my case) should not be a problem. The standard production system has got low cpu load.

macday · Oct 4, 2010

thanks. could you tell me which packages you installed ? (linux-image-2.6.35-22-server, headers, etc.)

You can also pm me.

caracolla · Oct 4, 2010

wireless-crda_1.12_amd64.deb
http://packages.ubuntu.com/maverick/amd64/wireless-crda/download

linux-image-2.6.35-22-server_2.6.35-22.33_amd64.deb :
http://packages.ubuntu.com/maverick/amd64/linux-image-2.6.35-22-server/download

# dpkg -i wireless-crda_1.12_amd64.deb
# dpkg -i linux-image-2.6.35-22-server_2.6.35-22.33_amd64.deb
# update-grub
# reboot

macday · Oct 4, 2010

what for is the wireless-crda... ?

caracolla · Oct 4, 2010

kernel package dependency. really small one (CRDA agent for wireless drivers)

macday · Oct 4, 2010

do you know if this http://tinyurl.com/3xhglz6 is inside ?

linuxdatacenter · Oct 4, 2010

caracolla said:
The box with maverick kernel is not tested in production yet.
So far the KSM enabled system seems stable and responsive as the standard one.
In a few days I'll do comparisons under load.
Concerning cpu, this (in my case) should not be a problem. The standard production system has got low cpu load.

I got all my systems running on ksm. The price you pay is increased cpu usage. However, the first limit I hit on my proxmoxes is ram and I/O so I don't care much about cpu.
Ksm stability is satisfactory for me. It simply works.

Erk · Oct 5, 2010

dietmar said:
Seems most people would prefer a 2.6.35 kernel for KVM only systems?

That would suit my test setup with the Mac Mini Server perfectly.

caracolla · Oct 5, 2010

macday said:
do you know if this http://tinyurl.com/3xhglz6 is inside ?

I haven't found it in the changelog.

dietmar · Oct 6, 2010

see http://forum.proxmox.com/threads/4803-New-Proxmox-VE-kernel-branch-2.6.35-with-KSM-support

linuxdatacenter · Oct 6, 2010

Hi,

I can confirm that my problem with storage array is gone with kernel 2.6.35-pve.

Proxmox VE with ubuntu-server kernel

linuxdatacenter

Guest

Ninjix

Guest

Proxmox Staff Member

linuxdatacenter

Guest

Member

Proxmox Staff Member

linuxdatacenter

Guest

caracolla

Guest

Member

caracolla

Guest

Member

caracolla

Guest

Member

caracolla

Guest

Member

linuxdatacenter

Guest

Renowned Member

caracolla

Guest

Proxmox Staff Member

linuxdatacenter

Guest

We value your privacy