PCIe Bus Errors with Sonnet Twin 10G sFP+ Thunderbolt 3 and Intel Nuc 12 Extreme

alanjohnwilliams

New Member
Feb 2, 2022
4
0
1
52
Hi Proxmox community,

Hi Support,



I have the Sonnet Twin 10G SFP+ Thunderbolt 3 to Dual 10 Gigabit Ethernet Adapter connected to an Intel Nuc 12 Extreme (NUC12DCMi9). I’m running Proxmox (linux) on this server and am having PCI issues with the thunderbolt / pci bus. The only thunderbolt device connected is the Sonnet 10G SFP+ adapter.

I’m seeing sporatic PCI bus failures that are corrected:

Code:
Oct 13 12:47:05 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 12:47:05 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 12:47:05 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 12:47:05 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 12:52:13 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 12:52:13 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 12:52:13 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 12:52:13 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 13:17:45 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 13:17:45 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 13:17:45 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 13:17:45 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 13:17:54 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 13:17:54 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 13:17:54 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 13:17:54 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 13:33:03 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 13:33:03 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 13:33:03 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 13:33:03 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 13:58:25 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 13:58:25 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 13:58:25 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 13:58:25 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 13:58:29 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 13:58:29 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 13:58:29 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 13:58:29 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 14:29:19 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 14:29:19 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 14:29:19 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 14:29:19 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 14:35:49 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 14:35:49 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Oct 13 14:35:49 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00001000/00002000
Oct 13 14:35:49 pve2 kernel: pcieport 0000:03:00.0: [12] Timeout
Oct 13 15:06:03 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 15:06:03 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 15:06:03 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 15:06:03 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 15:15:32 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 15:15:32 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Oct 13 15:15:32 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00001000/00002000
Oct 13 15:15:32 pve2 kernel: pcieport 0000:03:00.0: [12] Timeout

And then the occasional error that takes the system out:

Code:
Oct 13 16:44:47 pve2 kernel: pcieport 0000:00:1b.0: AER: Uncorrected (Fatal) error received: 0000:03:00.0
Oct 13 16:44:47 pve2 kernel: pcieport 0000:03:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
Oct 13 16:44:47 pve2 kernel: pcieport 0000:00:1b.0: AER: Root Port link has been reset (0)
Oct 13 16:44:47 pve2 kernel: pcieport 0000:00:1b.0: AER: device recovery failed

And

Code:
Oct 13 22:02:00 pve2 kernel: pcieport 0000:00:1b.0: AER: Uncorrected (Fatal) error received: 0000:03:00.0
Oct 13 22:02:00 pve2 kernel: pcieport 0000:03:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
Oct 13 22:02:00 pve2 kernel: thunderbolt 0000:05:00.0: AER: can't recover (no error_detected callback)
Oct 13 22:02:00 pve2 kernel: xhci_hcd 0000:39:00.0: AER: can't recover (no error_detected callback)
Oct 13 22:02:00 pve2 kernel: pcieport 0000:00:1b.0: AER: Root Port link has been reset (0)

The failure is coming from the Intel Thunderbolt 4 related pci devices:

Code:
00:00.0 Host bridge: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
00:01.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 (rev 02)
00:02.0 VGA compatible controller: Intel Corporation AlderLake-S GT1 (rev 0c)
00:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02)
00:08.0 System peripheral: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator (rev 02)
00:14.0 USB controller: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Alder Lake-S PCH Shared SRAM (rev 11)
00:14.3 Network controller: Intel Corporation Alder Lake-S PCH CNVi WiFi (rev 11)
00:16.0 Communication controller: Intel Corporation Alder Lake-S PCH HECI Controller #1 (rev 11)
00:17.0 SATA controller: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] (rev 11)
00:1b.0 PCI bridge: Intel Corporation Device 7ac4 (rev 11)
00:1c.0 PCI bridge: Intel Corporation Device 7abe (rev 11)
00:1d.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #9 (rev 11)
00:1d.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #13 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Z690 Chipset LPC/eSPI Controller (rev 11)
00:1f.3 Audio device: Intel Corporation Alder Lake-S HD Audio Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Alder Lake-S PCH SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-S PCH SPI Controller (rev 11)
01:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
02:00.0 Non-Volatile memory controller: Sandisk Corp Western Digital WD Black SN850X NVMe SSD (rev 01)
03:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
04:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
04:01.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
04:02.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
04:03.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
05:00.0 USB controller: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020]
06:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
08:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
08:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
39:00.0 USB controller: Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020]
6d:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-LM (rev 03)
6e:00.0 Ethernet controller: Aquantia Corp. AQC113C NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 03)
6f:00.0 Non-Volatile memory controller: Sandisk Corp Western Digital WD Black SN850X NVMe SSD (rev 01)

I’ve tried a number of different kernel boot options – but none of these have resolved the PCI bus errors:

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7 nvidia-drm.modeset=0 pcie_aspm=off"
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off pci=noaer intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7 nvidia-drm.modeset=0 intel_idle.max_cstate=1 pcie_ports=native pci=assign-busses,hpbussize=0x40,realloc,hpmmiosize=128M,hpmmioprefsize=4G"
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off pci=noaer intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7 nvidia-drm.modeset=0 intel_idle.max_cstate=1 pcie_ports=native pci=nommconf,assign-busses,hpbussize=0x40,realloc,hpmmiosize=128M,hpmmioprefsize=4G"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7 nvidia-drm.modeset=0 pcie_aspm=off"

I’m running 6.2.16 linux kernel:

Code:
Linux pve2 6.2.16-15-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-15 (2023-09-28T13:53Z) x86_64 GNU/Linux

I have ASPM turned off in the BIOS (as well as the kernel options).

Have you seen this issue before and any suggestions on resolving it?



Thanks!

-Alan
 
0000:3a:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06) 0000:3a:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06) 0000:3a:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06)

JHL7540 same error
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!