upgrade to porxmox 1.6 kernel 2.6.32-3 bnx2 cciss

  • Thread starter Thread starter senricosa
  • Start date Start date
S

senricosa

Guest
Hello Proxmox-Community,

I have been using proxmox for a while, upgrade to proxmox 1.6 caused a serious issue an an HP DL360 G6 Server

Following Error message in syslog:
Code:
Sep  4 12:35:03 pve0 kernel: Uhhuh. NMI received for unknown reason a0 on CPU 0.
Sep  4 12:35:03 pve0 kernel: You have some hardware problem, likely on the PCI bus.
Sep  4 12:35:03 pve0 kernel: Dazed and confused, but trying to continue
Sep  4 12:35:03 pve0 kernel: DRHD: handling fault status reg 2
Sep  4 12:35:03 pve0 kernel: DMAR:[DMA Read] Request device [03:00.0] fault addr ffe1d000 
Sep  4 12:35:03 pve0 kernel: DMAR:[fault reason 06] PTE Read access is not set

The Problem is IRQ combination of cciss and bnx2 module bnx2 using MSI:
Code:
Sep  4 11:17:07 pve0 kernel: cciss 0000:03:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28
Sep  4 11:17:07 pve0 kernel: cciss 0000:03:00.0: irq 59 for MSI/MSI-X
Sep  4 11:17:07 pve0 kernel: cciss 0000:03:00.0: irq 60 for MSI/MSI-X
Sep  4 11:17:07 pve0 kernel: cciss 0000:03:00.0: irq 61 for MSI/MSI-X
Sep  4 11:17:07 pve0 kernel: cciss 0000:03:00.0: irq 62 for MSI/MSI-X
Sep  4 11:17:07 pve0 kernel: IRQ 61/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
Sep  4 11:17:07 pve0 kernel: cciss0: <0x323a> at PCI 0000:03:00.0 IRQ 61 using DAC
Sep  4 11:17:07 pve0 kernel: cciss/c0d0: p1 p2
Sep  4 11:17:07 pve0 kernel: EXT3 FS on cciss/c0d0p1, internal journal
Thus cciss was not working properly.

Solution: Using 2.6.24-12-pve and it works well, because IRQ Prob is gone:
Code:
Sep  4 14:59:27 pve0 kernel: cciss0: <0x323a> at PCI 0000:03:00.0 IRQ 2291 using DAC

It was working with 2.6.32-1 (without openvz)

Erich
 
Sep 4 12:35:03 pve0 kernel: You have some hardware problem, likely on the PCI bus.

Believe here is the key

What cards are you using on the PCI buses? Maybe a RAID controller (cciss) and? More than 1? It looks like you are using the same IRQ for two devices and have a IRQ conflict

By the way, this is not an error its a warning.

Thus cciss was not working properly
What happened?

This seems to be actually a code bug that affects some PCI devices.
 
Last edited by a moderator:
What cards are you using on the PCI buses? Maybe a RAID controller (cciss) and? More than 1? It looks like you are using the same IRQ for two devices and have a IRQ conflict
cciss module is for the the Hewlett-Packard Company Smart Array RAID Controller, server got only one RAID Controller on PCI.
bnx2 the Broadcom Corporation NetXtreme II BCM5709 module use a lot of IRQ's (MSI-X) with 2.6.32-3 kernel, some of the IRQ's are shared with cciss.
Thus cciss was not working properly
What happened?
The RAID Volumes were not read/writable after a while.

This seems to be actually a code bug that affects some PCI devices.

I guess it is bug of the bnx2 module in 2.6.32-3, because with 2.6.24-12 bnx2 does not use as many IRQ's as with 2.6.32 and cciss has its own IRQ's to use.
 
I have the same problem on my HP ML350 G6 Server i do not have a Broadcom Corporation NetXtreme II, on 2.6.32-2 i had no problems only on 2.6.32-3.

i had this.
Code:
kernel: Uhhuh. NMI received for unknown reason a0 on CPU 0.
kernel: You have some hardware problem, likely on the PCI bus.
kernel: Dazed and confused, but trying to continue
kernel: DRHD: handling fault status reg 2
i also fount this in the logs.
Code:
kern.log.1:Sep  3 16:03:44 ve-gouda05 kernel: IRQ 62/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
(the 2.6.24-12-pve kernel works, but i want 2.6.32)

i think it is a bug somewhere, i do not dare to update the rest of my servers!!

I do not know what hardware conflicts in my case.
 
Last edited: