pciehp issue with SuperMicro AOC-4e2p wih 4x nvme drives

piexil

Active Member
Jul 20, 2018
13
1
43
Hi,
I have one machine I built with a Supermicro X10 SRL-F motherboard with the above addon card connected to 4x nvme drives in a raidz10 setup. The addon card is x8 using a PLX switch to connect 4x 4-lane drives.

Under any I/O load, dmesg gets spammed with:

Code:
[    3.367998] pciehp 0000:02:07.0:pcie204: Slot(103): Attention button pressed
[    3.376460] pciehp 0000:02:07.0:pcie204: Slot(103): Powering off due to button press
[    5.667485] pciehp 0000:02:07.0:pcie204: Slot(103): Attention button pressed
[    5.674213] pciehp 0000:02:07.0:pcie204: Slot(103): Button cancel
[    5.681097] pciehp 0000:02:07.0:pcie204: Slot(103): Action canceled due to button press

For each slot on the aoc.

Eventually, zfs will throw an i/o error and remove a drive (seemingly random which one it removes).
I have replaced all drives with other ones and still get the errors.

If I downgrade to 2 drives, I never get any errors.

This is on a clean install of 6.0, but I don't believe I saw it on 5.x (I can go back and check).

Here is full dmesg log:
http://batman.gyptis.org/zerobin/?1c1a9d31b13f0454#PmscOkllUt5g5bkU2wXVgB6dDLbcGJ+ESbY3WmfIKyY=


Thanks!
 
output of lspci

Output of lspci: (with sys peripherals removed due to character limit)

Code:
00:00.0 Host bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 (rev 01)
00:01.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 (rev 01)
00:01.1 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 (rev 01)
00:03.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 01)
00:03.2 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 01)
00:04.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 0 (rev 01)
00:04.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 1 (rev 01)
00:04.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 2 (rev 01)
00:04.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 3 (rev 01)
00:04.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 4 (rev 01)
00:04.5 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 5 (rev 01)
00:04.6 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 6 (rev 01)
00:04.7 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Crystal Beach DMA Channel 7 (rev 01)
00:05.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management (rev 01)
00:05.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO Hot Plug (rev 01)
00:05.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO RAS/Control Status/Global Errors (rev 01)
00:05.4 PIC: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D I/O APIC (rev 01)
00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR (rev 05)
00:11.4 SATA controller: Intel Corporation C610/X99 series chipset sSATA Controller [AHCI mode] (rev 05)
00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)
00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05)
00:16.1 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #2 (rev 05)
00:1a.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 (rev d5)
00:1c.4 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #5 (rev d5)
00:1c.5 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #6 (rev d5)
00:1c.6 PCI bridge: Intel Corporation C610/X99 series chipset PCI Express Root Port #7 (rev d5)
00:1d.0 USB controller: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation C610/X99 series chipset LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation C610/X99 series chipset SMBus Controller (rev 05)
01:00.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
02:01.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
02:04.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
02:05.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
02:06.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
02:07.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev b0)
04:00.0 Non-Volatile memory controller: Sandisk Corp Skyhawk Series NVME SSD (rev 01)
05:00.0 Non-Volatile memory controller: Sandisk Corp Skyhawk Series NVME SSD (rev 01)
06:00.0 Non-Volatile memory controller: Sandisk Corp Skyhawk Series NVME SSD (rev 01)
07:00.0 Non-Volatile memory controller: Sandisk Corp Skyhawk Series NVME SSD (rev 01)
0c:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
0d:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
0e:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)
0f:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
 
I experience the same problems while installing proxmox. Disks are WD SN630 - NVMe directly attached to Supermicro H11SSL-NC. The installation gets stuck randomly at extracting some package with the same messages about the attention button resulting in ZFS rpool I/O error.
 
I experience the same problems while installing proxmox. Disks are WD SN630 - NVMe directly attached to Supermicro H11SSL-NC. The installation gets stuck randomly at extracting some package with the same messages about the attention button resulting in ZFS rpool I/O error.
Sorry for the super late reply, I was able to fix this on at least one machine but editing my kernel command line to turn certain pci(e) features off. Note: I am also doing vga passthrough here so I have some extra options specified.
Code:
root@elite:~# cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 nomodeset pcie_acs_override=downstream,multifunction boot=zfs mitigations=off pci=realloc,pcie_bus_peer2peer,noats pcie_aspm=off pcie_ports=dpc_native amd_iommu=fullflush video=efifb:off vga=off

The important ones seem to be
pce=realloc,noats (realloc bars, no address translation servics) pcie_aspm=off (turns off advanced state power management) and pcie_ports=dpc=_native (turns off pcie additional features except for dpc)
Seems to be buggy drive or bios firmware issue more than a proxmox issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!