Hi Proxmox community,
Hi Support,
I have the Sonnet Twin 10G SFP+ Thunderbolt 3 to Dual 10 Gigabit Ethernet Adapter connected to an Intel Nuc 12 Extreme (NUC12DCMi9). I’m running Proxmox (linux) on this server and am having PCI issues with the thunderbolt / pci bus. The only thunderbolt device connected is the Sonnet 10G SFP+ adapter.
I’m seeing sporatic PCI bus failures that are corrected:
And then the occasional error that takes the system out:
And
The failure is coming from the Intel Thunderbolt 4 related pci devices:
I’ve tried a number of different kernel boot options – but none of these have resolved the PCI bus errors:
I’m running 6.2.16 linux kernel:
I have ASPM turned off in the BIOS (as well as the kernel options).
Have you seen this issue before and any suggestions on resolving it?
Thanks!
-Alan
Hi Support,
I have the Sonnet Twin 10G SFP+ Thunderbolt 3 to Dual 10 Gigabit Ethernet Adapter connected to an Intel Nuc 12 Extreme (NUC12DCMi9). I’m running Proxmox (linux) on this server and am having PCI issues with the thunderbolt / pci bus. The only thunderbolt device connected is the Sonnet 10G SFP+ adapter.
I’m seeing sporatic PCI bus failures that are corrected:
Code:
Oct 13 12:47:05 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 12:47:05 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 12:47:05 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 12:47:05 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 12:52:13 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 12:52:13 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 12:52:13 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 12:52:13 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 13:17:45 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 13:17:45 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 13:17:45 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 13:17:45 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 13:17:54 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 13:17:54 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 13:17:54 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 13:17:54 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 13:33:03 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 13:33:03 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 13:33:03 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 13:33:03 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 13:58:25 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 13:58:25 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 13:58:25 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 13:58:25 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 13:58:29 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 13:58:29 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 13:58:29 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 13:58:29 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 14:29:19 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 14:29:19 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 14:29:19 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 14:29:19 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 14:35:49 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 14:35:49 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Oct 13 14:35:49 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00001000/00002000
Oct 13 14:35:49 pve2 kernel: pcieport 0000:03:00.0: [12] Timeout
Oct 13 15:06:03 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 15:06:03 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Oct 13 15:06:03 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00000080/00002000
Oct 13 15:06:03 pve2 kernel: pcieport 0000:03:00.0: [ 7] BadDLLP
Oct 13 15:15:32 pve2 kernel: pcieport 0000:00:1b.0: AER: Corrected error received: 0000:03:00.0
Oct 13 15:15:32 pve2 kernel: pcieport 0000:03:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Oct 13 15:15:32 pve2 kernel: pcieport 0000:03:00.0: device [8086:1136] error status/mask=00001000/00002000
Oct 13 15:15:32 pve2 kernel: pcieport 0000:03:00.0: [12] Timeout
And then the occasional error that takes the system out:
Code:
Oct 13 16:44:47 pve2 kernel: pcieport 0000:00:1b.0: AER: Uncorrected (Fatal) error received: 0000:03:00.0
Oct 13 16:44:47 pve2 kernel: pcieport 0000:03:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
Oct 13 16:44:47 pve2 kernel: pcieport 0000:00:1b.0: AER: Root Port link has been reset (0)
Oct 13 16:44:47 pve2 kernel: pcieport 0000:00:1b.0: AER: device recovery failed
And
Code:
Oct 13 22:02:00 pve2 kernel: pcieport 0000:00:1b.0: AER: Uncorrected (Fatal) error received: 0000:03:00.0
Oct 13 22:02:00 pve2 kernel: pcieport 0000:03:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
Oct 13 22:02:00 pve2 kernel: thunderbolt 0000:05:00.0: AER: can't recover (no error_detected callback)
Oct 13 22:02:00 pve2 kernel: xhci_hcd 0000:39:00.0: AER: can't recover (no error_detected callback)
Oct 13 22:02:00 pve2 kernel: pcieport 0000:00:1b.0: AER: Root Port link has been reset (0)
The failure is coming from the Intel Thunderbolt 4 related pci devices:
Code:
00:00.0 Host bridge: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
00:01.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 (rev 02)
00:02.0 VGA compatible controller: Intel Corporation AlderLake-S GT1 (rev 0c)
00:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02)
00:08.0 System peripheral: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator (rev 02)
00:14.0 USB controller: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Alder Lake-S PCH Shared SRAM (rev 11)
00:14.3 Network controller: Intel Corporation Alder Lake-S PCH CNVi WiFi (rev 11)
00:16.0 Communication controller: Intel Corporation Alder Lake-S PCH HECI Controller #1 (rev 11)
00:17.0 SATA controller: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] (rev 11)
00:1b.0 PCI bridge: Intel Corporation Device 7ac4 (rev 11)
00:1c.0 PCI bridge: Intel Corporation Device 7abe (rev 11)
00:1d.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #9 (rev 11)
00:1d.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #13 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Z690 Chipset LPC/eSPI Controller (rev 11)
00:1f.3 Audio device: Intel Corporation Alder Lake-S HD Audio Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Alder Lake-S PCH SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-S PCH SPI Controller (rev 11)
01:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
02:00.0 Non-Volatile memory controller: Sandisk Corp Western Digital WD Black SN850X NVMe SSD (rev 01)
03:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
04:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
04:01.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
04:02.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
04:03.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
05:00.0 USB controller: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020]
06:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
08:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
08:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
39:00.0 USB controller: Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020]
6d:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-LM (rev 03)
6e:00.0 Ethernet controller: Aquantia Corp. AQC113C NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 03)
6f:00.0 Non-Volatile memory controller: Sandisk Corp Western Digital WD Black SN850X NVMe SSD (rev 01)
I’ve tried a number of different kernel boot options – but none of these have resolved the PCI bus errors:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7 nvidia-drm.modeset=0 pcie_aspm=off"
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off pci=noaer intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7 nvidia-drm.modeset=0 intel_idle.max_cstate=1 pcie_ports=native pci=assign-busses,hpbussize=0x40,realloc,hpmmiosize=128M,hpmmioprefsize=4G"
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off pci=noaer intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7 nvidia-drm.modeset=0 intel_idle.max_cstate=1 pcie_ports=native pci=nommconf,assign-busses,hpbussize=0x40,realloc,hpmmiosize=128M,hpmmioprefsize=4G"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt i915.enable_guc=3 i915.max_vfs=7 nvidia-drm.modeset=0 pcie_aspm=off"
I’m running 6.2.16 linux kernel:
Code:
Linux pve2 6.2.16-15-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-15 (2023-09-28T13:53Z) x86_64 GNU/Linux
I have ASPM turned off in the BIOS (as well as the kernel options).
Have you seen this issue before and any suggestions on resolving it?
Thanks!
-Alan