pciehp issue with SuperMicro AOC-4e2p wih 4x nvme drives

piexil

Active Member
Jul 20, 2018
14
1
43
Hi,
I have one machine I built with a Supermicro X10 SRL-F motherboard with the above addon card connected to 4x nvme drives in a raidz10 setup. The addon card is x8 using a PLX switch to connect 4x 4-lane drives.

Under any I/O load, dmesg gets spammed with:

Code:
[    3.367998] pciehp 0000:02:07.0:pcie204: Slot(103): Attention button pressed
[    3.376460] pciehp 0000:02:07.0:pcie204: Slot(103): Powering off due to button press
[    5.667485] pciehp 0000:02:07.0:pcie204: Slot(103): Attention button pressed
[    5.674213] pciehp 0000:02:07.0:pcie204: Slot(103): Button cancel
[    5.681097] pciehp 0000:02:07.0:pcie204: Slot(103): Action canceled due to button press

For each slot on the aoc.

Eventually, zfs will throw an i/o error and remove a drive (seemingly random which one it removes).
I have replaced all drives with other ones and still get the errors.

If I downgrade to 2 drives, I never get any errors.

This is on a clean install of 6.0, but I don't believe I saw it on 5.x (I can go back and check).

Here is full dmesg log:
http://batman.gyptis.org/zerobin/?1c1a9d31b13f0454#PmscOkllUt5g5bkU2wXVgB6dDLbcGJ+ESbY3WmfIKyY=


Thanks!
 
output of lspci

Output of lspci: (with sys peripherals removed due to character limit)

Code:
00:00.0 Host bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 (rev 01)
00:01.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 (rev 01)
00:01.1 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 (rev 01)
00:03.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 01)
00:03.2 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 01)
00:04.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 0 (rev 01)
00:04.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 1 (rev 01)
00:04.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 2 (rev 01)
00:04.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 3 (rev 01)
00:04.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 4 (rev 01)
00:04.5 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 5 (rev 01)
00:04.6 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 6 (rev 01)
00:04.7 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 7 (rev 01)
00:05.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management (rev 01)
00:05.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO Hot Plug (rev 01)
00:05.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO RAS/Control Status/Global Errors (rev 01)
00:05.4 PIC: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D I/O APIC (rev 01)
00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR (rev 05)
00:11.4 SATA controller: Intel Corporation C610/X99 series chipset sSATA Controller [AHCI mode] (rev 05)
00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05)
00:16.1 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #2 (rev 05)
00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 (rev d5)
00:1c.4 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #5 (rev d5)
00:1c.5 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #6 (rev d5)
00:1c.6 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #7 (rev d5)
00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation C610/X99 series chipset LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation C610/X99 series chipset SMBus Controller (rev 05)
01:00.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
02:01.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
02:04.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
02:05.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
02:06.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
02:07.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
04:00.0 Non-Volatile memory controller: Sandisk Corp Skyhawk Series NVME SSD (rev 01)
05:00.0 Non-Volatile memory controller: Sandisk Corp Skyhawk Series NVME SSD (rev 01)
06:00.0 Non-Volatile memory controller: Sandisk Corp Skyhawk Series NVME SSD (rev 01)
07:00.0 Non-Volatile memory controller: Sandisk Corp Skyhawk Series NVME SSD (rev 01)
0c:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
0d:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
0e:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)
0f:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
 
I experience the same problems while installing proxmox. Disks are WD SN630 - NVMe directly attached to Supermicro H11SSL-NC. The installation gets stuck randomly at extracting some package with the same messages about the attention button resulting in ZFS rpool I/O error.
 
I experience the same problems while installing proxmox. Disks are WD SN630 - NVMe directly attached to Supermicro H11SSL-NC. The installation gets stuck randomly at extracting some package with the same messages about the attention button resulting in ZFS rpool I/O error.
Sorry for the super late reply, I was able to fix this on at least one machine but editing my kernel command line to turn certain pci(e) features off. Note: I am also doing vga passthrough here so I have some extra options specified.
Code:
root@elite:~# cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 nomodeset pcie_acs_override=downstream,multifunction boot=zfs mitigations=off pci=realloc,pcie_bus_peer2peer,noats pcie_aspm=off pcie_ports=dpc_native amd_iommu=fullflush video=efifb:off vga=off

The important ones seem to be
pce=realloc,noats (realloc bars, no address translation servics) pcie_aspm=off (turns off advanced state power management) and pcie_ports=dpc=_native (turns off pcie additional features except for dpc)
Seems to be buggy drive or bios firmware issue more than a proxmox issue.