Many Errors on Proxmox Hypervisor

IT-META

New Member
Oct 18, 2024
7
0
1
Hello,

I find these recurring lines on my hypervisor (2 or 3 times per hour) :
kernel: pcieport 0000:00:02.1: AER: Correctable error message received from 0000:04:08.0
kernel: pcieport 0000:04:08.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
kernel: pcieport 0000:04:08.0: device [1022:43f5] error status/mask=00000040/00006000
kernel: pcieport 0000:04:08.0: [ 6] BadTLP

I am running proxmox-ve: 8.2.0 (running kernel: 6.8.12-2-pve) on these hardware :
01:00.0 VGA compatible controller: NVIDIA Corporation AD103 [GeForce RTX 4080] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 22bb (rev a1)
02:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. NV3 NVMe SSD SM2268XT2 (DRAM-less) (rev 03)
06:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN580 NVMe SSD (DRAM-less) (rev 01)
0a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
0b:00.0 Non-Volatile memory controller: Sandisk Corp Western Digital WD Black SN850X NVMe SSD (rev 01)
0c:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset USB 3.2 Controller (rev 01)
0d:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset SATA Controller (rev 01)
0e:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset USB 3.2 Controller (rev 01)
0f:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset SATA Controller (rev 01)
10:00.0 Non-Volatile memory controller: Micron/Crucial Technology P5 Plus NVMe PCIe SSD
11:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI
11:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI
12:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15b8


I have found on many topic that I have to add the option pcie_aspm=off to my grub. Has anyone encountered this problem and solved it with this option in the grub?

Thanks

Steve
 
Has anyone encountered this problem and solved it with this option in the grub?
Why solve it? The error message states, that it has already been solved, so there is no problem.

I am running proxmox-ve: 8.2.0 (running kernel: 6.8.12-2-pve) on these hardware
The device 04:08.0 is not listed in your output, is that the complete output of lspci?
 
  • Like
Reactions: Johannes S
Hello,

Why did you say that there is no problem ?

The device is :
04:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)

The message is always present. For me, there is always a problem.
 
Why did you say that there is no problem ?
Computer are full of problems, the ones that are automatically corrected are good ones. It it would not be corrected, your system would just freeze/hang or reboot.

The message states that there is a problem on the data link layer in PCI communication. You cannot do anything about that besides change the hardware
 
If you want to hide such messages, disable PCIe Advanced Error Reporting (PCIe AER) in your BIOS. Whatever hardware causes them will still cause them, but you won't see them in your logs. The downside is that uncorrectable errors, the bad ones, won't show up in your logs either...

If you really want to solve them you you dig deep identifying which hardware produces it using their PCI ID, asking support from hardware vendor and so on.
 
  • Like
Reactions: LnxBil