Hardware errors with PCI passthrough

yaro014

Active Member
Dec 27, 2012
32
15
28
I'm not sure if this is directly related to PCI passthrough however dmesg on host returns, I don't really notice any major instability because of it but thought about asking as someone might have come across it before.

Code:
[171859.861517] pcieport 0000:00:03.1: AER: aer_status: 0x00000040, aer_mask: 0x00000000
[171859.862224] pcieport 0000:00:03.1:    [ 6] BadTLP               
[171859.862902] pcieport 0000:00:03.1: AER: aer_layer=Data Link Layer, aer_agent=Receiver ID
[174302.968163] {97}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[174302.968964] {97}[Hardware Error]: It has been corrected by h/w and requires no further action
[174302.969737] {97}[Hardware Error]: event severity: corrected
[174302.970508] {97}[Hardware Error]:  Error 0, type: corrected
[174302.971276] {97}[Hardware Error]:   section_type: PCIe error
[174302.972039] {97}[Hardware Error]:   port_type: 4, root port
[174302.972779] {97}[Hardware Error]:   version: 0.2
[174302.973529] {97}[Hardware Error]:   command: 0x0407, status: 0x0010
[174302.974292] {97}[Hardware Error]:   device_id: 0000:00:03.1
[174302.975051] {97}[Hardware Error]:   slot: 0
[174302.975773] {97}[Hardware Error]:   secondary_bus: 0x02
[174302.976471] {97}[Hardware Error]:   vendor_id: 0x1022, device_id: 0x1483
[174302.977180] {97}[Hardware Error]:   class_code: 060400
[174302.977901] {97}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0012
[174302.988373] pcieport 0000:00:03.1: AER: aer_status: 0x00000040, aer_mask: 0x00000000
[174302.989119] pcieport 0000:00:03.1:    [ 6] BadTLP               
[174302.989840] pcieport 0000:00:03.1: AER: aer_layer=Data Link Layer, aer_agent=Receiver ID
[176319.068379] {98}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[176319.069140] {98}[Hardware Error]: It has been corrected by h/w and requires no further action
[176319.069866] {98}[Hardware Error]: event severity: corrected
[176319.070592] {98}[Hardware Error]:  Error 0, type: corrected
[176319.071312] {98}[Hardware Error]:   section_type: PCIe error
[176319.072028] {98}[Hardware Error]:   port_type: 4, root port
[176319.072744] {98}[Hardware Error]:   version: 0.2
[176319.073458] {98}[Hardware Error]:   command: 0x0407, status: 0x0010
[176319.074177] {98}[Hardware Error]:   device_id: 0000:00:03.1
[176319.074890] {98}[Hardware Error]:   slot: 0
[176319.075592] {98}[Hardware Error]:   secondary_bus: 0x02
[176319.076279] {98}[Hardware Error]:   vendor_id: 0x1022, device_id: 0x1483
[176319.076968] {98}[Hardware Error]:   class_code: 060400
[176319.077646] {98}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0012
[176319.089390] pcieport 0000:00:03.1: AER: aer_status: 0x00000040, aer_mask: 0x00000000
[176319.090091] pcieport 0000:00:03.1:    [ 6] BadTLP               
[176319.090774] pcieport 0000:00:03.1: AER: aer_layer=Data Link Layer, aer_agent=Receiver ID
[178996.450972] {99}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[178996.451778] {99}[Hardware Error]: It has been corrected by h/w and requires no further action
[178996.452561] {99}[Hardware Error]: event severity: corrected
[178996.453341] {99}[Hardware Error]:  Error 0, type: corrected
[178996.454075] {99}[Hardware Error]:   section_type: PCIe error
[178996.454867] {99}[Hardware Error]:   port_type: 4, root port
[178996.455637] {99}[Hardware Error]:   version: 0.2
[178996.456404] {99}[Hardware Error]:   command: 0x0407, status: 0x0010
[178996.457172] {99}[Hardware Error]:   device_id: 0000:00:03.1
[178996.457909] {99}[Hardware Error]:   slot: 0
[178996.458665] {99}[Hardware Error]:   secondary_bus: 0x02
[178996.459408] {99}[Hardware Error]:   vendor_id: 0x1022, device_id: 0x1483
[178996.460151] {99}[Hardware Error]:   class_code: 060400
[178996.460881] {99}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0012
[178996.470137] pcieport 0000:00:03.1: AER: aer_status: 0x00000040, aer_mask: 0x00000000
[178996.470896] pcieport 0000:00:03.1:    [ 6] BadTLP               
[178996.471619] pcieport 0000:00:03.1: AER: aer_layer=Data Link Layer, aer_agent=Receiver ID
[180202.017726] {100}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[180202.018495] {100}[Hardware Error]: It has been corrected by h/w and requires no further action
[180202.019221] {100}[Hardware Error]: event severity: corrected
[180202.019948] {100}[Hardware Error]:  Error 0, type: corrected
[180202.020669] {100}[Hardware Error]:   section_type: PCIe error
[180202.021386] {100}[Hardware Error]:   port_type: 4, root port
[180202.022104] {100}[Hardware Error]:   version: 0.2
[180202.022820] {100}[Hardware Error]:   command: 0x0407, status: 0x0010
[180202.023540] {100}[Hardware Error]:   device_id: 0000:00:03.1
[180202.024255] {100}[Hardware Error]:   slot: 0
[180202.024956] {100}[Hardware Error]:   secondary_bus: 0x02
[180202.025645] {100}[Hardware Error]:   vendor_id: 0x1022, device_id: 0x1483
[180202.026335] {100}[Hardware Error]:   class_code: 060400
[180202.027014] {100}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0012
[180202.036201] pcieport 0000:00:03.1: AER: aer_status: 0x00001000, aer_mask: 0x00000000
[180202.036906] pcieport 0000:00:03.1:    [12] Timeout               
[180202.037583] pcieport 0000:00:03.1: AER: aer_layer=Data Link Layer, aer_agent=Transmitter ID
[183197.614670] {101}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[183197.615441] {101}[Hardware Error]: It has been corrected by h/w and requires no further action
[183197.616177] {101}[Hardware Error]: event severity: corrected
[183197.616911] {101}[Hardware Error]:  Error 0, type: corrected
[183197.617641] {101}[Hardware Error]:   section_type: PCIe error
[183197.618396] {101}[Hardware Error]:   port_type: 4, root port
[183197.619125] {101}[Hardware Error]:   version: 0.2
[183197.619849] {101}[Hardware Error]:   command: 0x0407, status: 0x0010
[183197.620576] {101}[Hardware Error]:   device_id: 0000:00:03.1
[183197.621301] {101}[Hardware Error]:   slot: 0
[183197.622040] {101}[Hardware Error]:   secondary_bus: 0x02
[183197.622789] {101}[Hardware Error]:   vendor_id: 0x1022, device_id: 0x1483
[183197.623497] {101}[Hardware Error]:   class_code: 060400
[183197.624187] {101}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0012
[183197.634363] pcieport 0000:00:03.1: AER: aer_status: 0x00000040, aer_mask: 0x00000000
[183197.635094] pcieport 0000:00:03.1:    [ 6] BadTLP               
[183197.635784] pcieport 0000:00:03.1: AER: aer_layer=Data Link Layer, aer_agent=Receiver ID
[184553.715662] {102}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[184553.716429] {102}[Hardware Error]: It has been corrected by h/w and requires no further action
[184553.717161] {102}[Hardware Error]: event severity: corrected
[184553.717926] {102}[Hardware Error]:  Error 0, type: corrected
[184553.718652] {102}[Hardware Error]:   section_type: PCIe error
[184553.719372] {102}[Hardware Error]:   port_type: 4, root port
[184553.720092] {102}[Hardware Error]:   version: 0.2
[184553.720809] {102}[Hardware Error]:   command: 0x0407, status: 0x0010
[184553.721609] {102}[Hardware Error]:   device_id: 0000:00:03.1
[184553.722326] {102}[Hardware Error]:   slot: 0
[184553.723028] {102}[Hardware Error]:   secondary_bus: 0x02
[184553.723717] {102}[Hardware Error]:   vendor_id: 0x1022, device_id: 0x1483
[184553.724407] {102}[Hardware Error]:   class_code: 060400
[184553.725087] {102}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0012
[184553.736343] pcieport 0000:00:03.1: AER: aer_status: 0x00000040, aer_mask: 0x00000000
[184553.737055] pcieport 0000:00:03.1:    [ 6] BadTLP               
[184553.737784] pcieport 0000:00:03.1: AER: aer_layer=Data Link Layer, aer_agent=Receiver ID
 

oguz

Proxmox Retired Staff
Retired Staff
Nov 19, 2018
5,207
707
118
hi,


Code:
[171859.862902] pcieport 0000:00:03.1: AER: aer_layer=Data Link Layer, aer_agent=Receiver ID
[174302.968163] {97}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[174302.968964] {97}[Hardware Error]: It has been corrected by h/w and requires no further action
[174302.969737] {97}[Hardware Error]: event severity: corrected
[174302.970508] {97}[Hardware Error]:  Error 0, type: corrected
[174302.971276] {97}[Hardware Error]:   section_type: PCIe error
[174302.972039] {97}[Hardware Error]:   port_type: 4, root port
[174302.972779] {97}[Hardware Error]:   version: 0.2
[174302.973529] {97}[Hardware Error]:   command: 0x0407, status: 0x0010
[174302.974292] {97}[Hardware Error]:   device_id: 0000:00:03.1
[174302.975051] {97}[Hardware Error]:   slot: 0
[174302.975773] {97}[Hardware Error]:   secondary_bus: 0x02
[174302.976471] {97}[Hardware Error]:   vendor_id: 0x1022, device_id: 0x1483
[174302.977180] {97}[Hardware Error]:   class_code: 060400
[174302.977901] {97}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0012

the error message says it's a PCIe error, and the PCI address is 0000:00:03.1

you could run lspci -nn| grep '03.1' to check which device is affected
 

yaro014

Active Member
Dec 27, 2012
32
15
28
Well this returns below :

Code:
root@pve:~# lspci -nn| grep '03.1'
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
2a:00.0 PCI bridge [0604]: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge [1a03:1150] (rev 04)
60:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]

What would have caused it I don't know, Might have something to do with GPU passthrough but I checked and these devices are not in the same iommu group nor I believe they need to be passed through to VM as well.
 
Last edited:

SBUSSER

New Member
Feb 18, 2021
5
0
1
45
Hi,

I have the same error, any news on this error.
I use an AMD EPYC board with the following kernel version : Linux 5.13.19-4-pve #1 SMP PVE 5.13.19-9 (Mon, 07 Feb 2022 11:01:14 +0100)

[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: It has been corrected by h/w and requires no further action
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: event severity: corrected
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: Error 0, type: corrected
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: section_type: PCIe error
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: port_type: 4, root port
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: version: 0.2
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: command: 0x0407, status: 0x0010
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: device_id: 0000:40:01.1
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: slot: 2
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: secondary_bus: 0x41
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: vendor_id: 0x1022, device_id: 0x1483
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: class_code: 060400
[Thu Feb 17 23:19:48 2022] {104}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0012
[Thu Feb 17 23:19:48 2022] pcieport 0000:40:01.1: AER: aer_status: 0x00001000, aer_mask: 0x00000000
[Thu Feb 17 23:19:48 2022] pcieport 0000:40:01.1: [12] Timeout
[Thu Feb 17 23:19:48 2022] pcieport 0000:40:01.1: AER: aer_layer=Data Link Layer, aer_agent=Transmitter ID
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: It has been corrected by h/w and requires no further action
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: event severity: corrected
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: Error 0, type: corrected
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: section_type: PCIe error
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: port_type: 4, root port
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: version: 0.2
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: command: 0x0407, status: 0x0010
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: device_id: 0000:40:01.1
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: slot: 2
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: secondary_bus: 0x41
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: vendor_id: 0x1022, device_id: 0x1483
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: class_code: 060400
[Thu Feb 17 23:39:54 2022] {105}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0012
[Thu Feb 17 23:39:54 2022] pcieport 0000:40:01.1: AER: aer_status: 0x00001000, aer_mask: 0x00000000
[Thu Feb 17 23:39:54 2022] pcieport 0000:40:01.1: [12] Timeout
[Thu Feb 17 23:39:54 2022] pcieport 0000:40:01.1: AER: aer_layer=Data Link Layer, aer_agent=Transmitter ID
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: It has been corrected by h/w and requires no further action
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: event severity: corrected
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: Error 0, type: corrected
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: section_type: PCIe error
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: port_type: 4, root port
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: version: 0.2
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: command: 0x0407, status: 0x0010
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: device_id: 0000:40:01.1
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: slot: 2
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: secondary_bus: 0x41
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: vendor_id: 0x1022, device_id: 0x1483
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: class_code: 060400
[Thu Feb 17 23:40:15 2022] {106}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0012
[Thu Feb 17 23:40:15 2022] pcieport 0000:40:01.1: AER: aer_status: 0x00001000, aer_mask: 0x00000000
[Thu Feb 17 23:40:15 2022] pcieport 0000:40:01.1: [12] Timeout
[Thu Feb 17 23:40:15 2022] pcieport 0000:40:01.1: AER: aer_layer=Data Link Layer, aer_agent=Transmitter ID
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 512
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: It has been corrected by h/w and requires no further action
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: event severity: corrected
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: Error 0, type: corrected
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: section_type: PCIe error
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: port_type: 4, root port
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: version: 0.2
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: command: 0x0407, status: 0x0010
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: device_id: 0000:40:01.1
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: slot: 2
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: secondary_bus: 0x41
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: vendor_id: 0x1022, device_id: 0x1483
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: class_code: 060400
[Thu Feb 17 23:50:28 2022] {107}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0012
[Thu Feb 17 23:50:28 2022] pcieport 0000:40:01.1: AER: aer_status: 0x00001000, aer_mask: 0x00000000
[Thu Feb 17 23:50:28 2022] pcieport 0000:40:01.1: [12] Timeout
[Thu Feb 17 23:50:28 2022] pcieport 0000:40:01.1: AER: aer_layer=Data Link Layer, aer_agent=Transmitter ID

root@lu0m00pve02:~# lspci -nn |grep 1483
40:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
40:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
40:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
80:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
c0:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
c0:03.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]

Thank you
 
Last edited:

jbattermann

New Member
Jul 26, 2021
5
0
1
42
Just saw the very same messages on my Epyc (Milan) based system. Has anyone found out what these are / how to get rid of these?

And out of curiosity.. what motherboards / CPUs are you using?
 

SBUSSER

New Member
Feb 18, 2021
5
0
1
45
For me it is related to my SAS3 HBA, now with the latest kernel I have no longer the error messages.
My motherboard is an ASRock - EPYCD8/R32 / CPU : AMD EPYC 7272 2.9 GHz (12C/24T)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!