Proxmox VE with ubuntu-server kernel

  • Thread starter Thread starter linuxdatacenter
  • Start date Start date
L

linuxdatacenter

Guest
Hi,

I've recently upgraded the kernel on proxmox VE hypervisor to the latest ubuntu version. It was done to solve some problems with multipath. With kernel-2.6.32-2-pve multipath would hang for some time when we were doing reboots of our storage arrays.
We had this sort of messages and stack traces from qlogic driver as well as emulex (fc hba):

Oct 1 10:10:19 s10826 kernel: sd 1:0:1:0: [sdd] CDB: Write(10): 2a 00 0a db 3f d0 00 00 08 00
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): DEVICE RESET ISSUED.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): DEVICE RESET FAILED: Task management failed.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): TARGET RESET ISSUED.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): TARGET RESET FAILED: Task management failed.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): TARGET RESET ISSUED.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): TARGET RESET FAILED: Task management failed.
Oct 1 10:10:19 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): BUS RESET ISSUED.
Oct 1 10:10:19 s10826 kernel: sd 1:0:1:0: [sdd] Synchronizing SCSI cache
Oct 1 10:10:25 s10826 kernel: qla2xxx 0000:06:00.1: qla2xxx_eh_bus_reset: reset succeded
Oct 1 10:10:35 s10826 kernel: qla2xxx 0000:06:00.1: scsi(1:1:0): ADAPTER RESET ISSUED.
Oct 1 10:13:49 s10826 kernel: qla2xxx_1_dpc D 0000000000000000 0 958 2 0x00000000
Oct 1 10:13:49 s10826 kernel: ffff880214701ad0 0000000000000046 ffff880228415c30 0000000000000000

[here goes stack trace]

Upgrading our kernel to latest ubuntu version (2.6.35) cleared the problem. Machines are running fine and we have no more issues with multipath failover. However extensive testing is required before it hits production.
Is anyone running such kinda setup (proxmox ve + ubuntu kernel)?
 
Last November I ran with Proxmox on Ubuntu 9.10 for a few weeks of testing. It worked but had performance problems hosting Windows 2003 running SQL 2005. Under high IO load the guest clocks would fall behind and this lead to SQL replication errors. The Proxmox guys do a good job testing and tunning their kernel. I switched back to Proxmox bare metal as soon as they released the 2.6.32 kernel with KSM.
 
Hi,

That would be great if you gave us an updated kernel for testing!

The situation I described takes place during upgrades of our storage array (3PAR). We've got two paths from servers to the array. During the upgrade, storage array controllers are reset one after another. When the first controller is reset, everything is fine - the paths are failed over to the second controller. The second controller is rebooted when the first comes online again. After the second controller goes down, we have the error I described, along with stack traces. Virtual machines freeze for about a minute. Sometimes guest filesystems go readonly and they need to be rebooted. This does not happen with ubuntu kernel.

Please note - this problem does not happen when we turn off a port on FC switch (not the controller on the array itself).

If you want us to test new kernels, now is the time, as we have a new shiny 3PAR box right here which has not hit production yet, so it can be rebooted over and over ;-)
 
I've two identical server with 16GB ram each.
9 VMs in a mix of KVM WindowsXP/2000/2003/2008/Ubuntu
In almost idle VMs load
- on the standard PVE (2.6.32-4 kernel without KSM) 11.4 GB is occupied
- on the other server (with linux-image-2.6.35-22-server_2.6.35-22.33_amd64.deb kernel) same configuration and same VMs I get only 3.8GB occupied.
Really impressive
 
The box with maverick kernel is not tested in production yet.
So far the KSM enabled system seems stable and responsive as the standard one.
In a few days I'll do comparisons under load.
Concerning cpu, this (in my case) should not be a problem. The standard production system has got low cpu load.
 
kernel package dependency. really small one (CRDA agent for wireless drivers)
 
The box with maverick kernel is not tested in production yet.
So far the KSM enabled system seems stable and responsive as the standard one.
In a few days I'll do comparisons under load.
Concerning cpu, this (in my case) should not be a problem. The standard production system has got low cpu load.

I got all my systems running on ksm. The price you pay is increased cpu usage. However, the first limit I hit on my proxmoxes is ram and I/O so I don't care much about cpu.
Ksm stability is satisfactory for me. It simply works.
 
Hi,

I can confirm that my problem with storage array is gone with kernel 2.6.35-pve.