PVE 8.4.19 causing P440ar controller problem?

dwma

New Member
Apr 3, 2025
16
1
3
Hi,
Strange situation - HA cluster (3 nodes): HP DL380 G9 with P440ar controller (all servers have the same hw config). Prevously had some older PVE version like 8.4.12 or even 8.4.0... nvm.
Updated all servers to the newest PVE 8.4.19 - just want to have the newest 8.x version, before I upgrade to 9.x.

But few hours after updating to the latest 8.4.x - one server from the cluster failed (Smart Array P440ar Controller failure)...
Ok, controller failure, it may happen. But also found, that it has FW 7.0, and the newest is 7.2 - upgraded, rebooted - all green and seems to be working.

Again after few hours another one failed in the same way...

Does anyone had similar problem? Servers were working few months without issue, and right after PVE update "controller failure". Coincidence? Bug?
 
Hi @dwma

thanks for posting in the forum!

Can you please share a few details on the failure mode i.e. the journals of a failure:
journalctl --since "2026-06-10 08:00" --until "2026-06-12 12:00"
Please adapt the timestamps accordingly.

Also can you please provide the output of the following commands:
pveversion -v
lspci # Find the identifier of your controller here (i.e.) 12:00.0
lspci -vvvs <identifier-from-above>

Can you also please check if the HP Shared Memory Features are enabled in your Controller configuration?

Are you using PCIe Passthrough on this server (not necessarily for this controller) or any form of IOMMU?

Yours sincerely
Jonas
 
Hi @jtheisen.
At the start of the PVE there were a lot of DMAR errors (and before update, none of them were appearing):
1781264046268.png

dmesg:
Code:
Jun 11 20:33:41.216928 pve1 kernel: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
Jun 11 20:33:41.217031 pve1 kernel: I/O error, dev dm-8, sector 887237328 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
Jun 11 20:33:41.217085 pve1 kernel: I/O error, dev dm-8, sector 887237472 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
Jun 11 20:33:41.217137 pve1 kernel: I/O error, dev dm-8, sector 887237608 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
Jun 11 20:33:41.217177 pve1 kernel: I/O error, dev dm-8, sector 887237720 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
Jun 11 20:33:41.217216 pve1 kernel: I/O error, dev dm-8, sector 887237976 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
Jun 11 20:33:41.217260 pve1 kernel: I/O error, dev dm-8, sector 887238144 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
Jun 11 20:33:41.217309 pve1 kernel: I/O error, dev dm-8, sector 887238304 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
Jun 11 20:33:41.217354 pve1 kernel: I/O error, dev dm-8, sector 887238456 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
Jun 11 20:33:41.217404 pve1 kernel: I/O error, dev dm-8, sector 887238600 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 2
Jun 11 20:33:59.455059 pve1 kernel: I/O error, dev dm-8, sector 1017697088 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 2
Jun 11 20:33:59.470921 pve1 kernel: I/O error, dev dm-8, sector 1017697088 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 2
Jun 11 20:33:59.486920 pve1 kernel: I/O error, dev dm-8, sector 1017697088 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 2

pveversion:
Code:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-14-pve)
pve-manager: 8.4.19 (running version: 8.4.19/a68fb383814bb1e6)
proxmox-kernel-helper: 8.1.4
proxmox-kernel-6.8: 6.8.12-29
proxmox-kernel-6.8.12-29-pve-signed: 6.8.12-29
proxmox-kernel-6.8.12-14-pve-signed: 6.8.12-14
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
ceph-fuse: 17.2.8-pve2
corosync: 3.1.10-pve2~bpo12+1
criu: 3.17.1-2+deb12u2
frr-pythontools: 10.2.3-1+pve1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.2
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.3
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.3
libpve-cluster-perl: 8.1.3
libpve-common-perl: 8.3.8
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.3
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.8
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-2
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2.1
proxmox-backup-client: 3.4.7-1
proxmox-backup-file-restore: 3.4.7-1
proxmox-backup-restore-image: 0.7.0
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.4
proxmox-mail-forward: 0.3.3
proxmox-mini-journalreader: 1.5
proxmox-offline-mirror-helper: 0.6.8
proxmox-widget-toolkit: 4.3.17
pve-cluster: 8.1.3
pve-container: 5.3.5
pve-docs: 8.4.2
pve-edk2-firmware: 4.2025.05-1~bpo12+1
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.2
pve-firmware: 3.16-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.5
pve-qemu-kvm: 9.2.0-7
pve-xtermjs: 5.5.0-2
qemu-server: 8.4.8
smartmontools: 7.3-pve1
spiceterm: 3.3.1
swtpm: 0.8.0+pve1
vncterm: 1.8.2
zfsutils-linux: 2.2.9-pve1

lspci
Code:
03:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Gen9 Controllers (rev 01)
        DeviceName: Embedded RAID 1
        Subsystem: Hewlett-Packard Company P440ar
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        NUMA node: 0
        IOMMU group: 54
        Region 0: Memory at 98100000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at 98200000 (64-bit, non-prefetchable) [size=1K]
        Region 4: I/O ports at 3000 [size=256]
        Expansion ROM at 98280000 [virtual] [disabled] [size=512K]
        Capabilities: [80] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [b0] MSI-X: Enable+ Count=64 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [c0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W
                DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Via message, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp+ ExtTPHComp-
                         AtomicOpsCap: 32bit+ 64bit+ 128bitCAS+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [300 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Kernel driver in use: hpsa
        Kernel modules: hpsa

Currently on 2 hosts I've pinned the older kernel (6.8.12-14-pve) - which seems to be fine for me, at least I don't see any errors on the pve startup. Also controller didn't "failed" in iLO and the fans are not spinning at 100%.

One host I've upgraded to the latest PVE9 for tests (using kernel: 7.0.6-2-pve) and also didn't throwed any error at the pve startup, but I'll give it few more days to work with some non-critical VMs to check if it'll fail or not.