Looking for some assistance with this error.
I just installed Proxmox for the first time to give it a spin and I'm running into AER errors with my ASUS Hyper M.2 X16 PCIe 4.0 X4 Expansion Card that is populated with 4 NVMe drives. This is device_id: 0000:81:00.0. This is an Epyc system on a TYAN S8030GM4NE-2T board. There are no options in the BIOS to enable to disable AER from what I can tell.
What does work is specifying GEN3 for the slot that the expansion card is plugged into.
I just installed Proxmox for the first time to give it a spin and I'm running into AER errors with my ASUS Hyper M.2 X16 PCIe 4.0 X4 Expansion Card that is populated with 4 NVMe drives. This is device_id: 0000:81:00.0. This is an Epyc system on a TYAN S8030GM4NE-2T board. There are no options in the BIOS to enable to disable AER from what I can tell.
I have tested adding pci=nommconf and pcie_aspm=off to grub with no success.
nano /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off pci=nommconf"
GRUB_CMDLINE_LINUX=""
update-grub
reboot
What does work is specifying GEN3 for the slot that the expansion card is plugged into.
Code:
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: Error 12, type: corrected
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: section_type: PCIe error
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: port_type: 0, PCIe end point
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: version: 0.2
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: command: 0x0406, status: 0x0010
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: device_id: 0000:81:00.0
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: slot: 0
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: secondary_bus: 0x00
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: vendor_id: 0x1987, device_id: 0x5016
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: class_code: 010802
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0000
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: Error 13, type: corrected
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: section_type: PCIe error
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: port_type: 0, PCIe end point
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: version: 0.2
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: command: 0x0406, status: 0x0010
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: device_id: 0000:81:00.0
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: slot: 0
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: secondary_bus: 0x00
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: vendor_id: 0x1987, device_id: 0x5016
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: class_code: 010802
Mar 01 21:18:14 test_server kernel: {15}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0000
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: [ 0] RxErr (First)
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: [ 0] RxErr (First)
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: [ 0] RxErr (First)
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: [ 0] RxErr (First)
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: [ 0] RxErr (First)
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: [ 0] RxErr (First)
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
Mar 01 21:18:14 test_server kernel: nvme 0000:81:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000