I'm repeatedly seeing this in the console:
From googling, I can see that this is the Advanced Error Reporting on the nvme.
The nvme is a Lexar NM620.
The box is a Lenovo M900
I have installed proxmox twice. Both times I have the same error.
First install using the bootable installer: "Proxmox VE 7.3 ISO Installer - Updated on 22 November 2022 - Version: 7.3-1"
And also on an install following this guide (with LUKS encryption at boot): https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_11_Bullseye
Here's the output of smartctl:
Syslog repeats the following over and over, here is a sample:
From what I can tell its some sort of error with the PCI bus which the OS is managing to correct.
Is there anything I can do to narrow down the cause more? For example, Is this a software/driver problem or hardware?
Edit: added smartctl output and syslog.
nvme: 0000:01:00.0: AER: Error of this agent is reported first
From googling, I can see that this is the Advanced Error Reporting on the nvme.
The nvme is a Lexar NM620.
The box is a Lenovo M900
I have installed proxmox twice. Both times I have the same error.
First install using the bootable installer: "Proxmox VE 7.3 ISO Installer - Updated on 22 November 2022 - Version: 7.3-1"
And also on an install following this guide (with LUKS encryption at boot): https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_11_Bullseye
Here's the output of smartctl:
Code:
root@proxmox:~# smartctl /dev/nvme0 -a
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.85-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Lexar SSD NM620 512GB
Serial Number:
Firmware Version: V1.27
PCI Vendor/Subsystem ID: 0x1d97
IEEE OUI Identifier: 0xcaf25b
Total NVM Capacity: 512,110,190,592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 0
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: caf25b 02d00005e7
Local Time is: Sun Mar 5 16:39:57 2023 GMT
Firmware Updates (0x02): 1 Slot
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005c): DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0a): Cmd_Eff_Lg Telmtry_Lg
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 81 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 5.00W - - 0 0 0 0 5 700
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 3
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 27 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 1,002,254 [513 GB]
Data Units Written: 1,958,086 [1.00 TB]
Host Read Commands: 17,752,740
Host Write Commands: 46,475,530
Controller Busy Time: 40
Power Cycles: 36
Power On Hours: 115
Unsafe Shutdowns: 9
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
Syslog repeats the following over and over, here is a sample:
Code:
Mar 5 16:53:22 proxmox kernel: [ 938.373722] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 5 16:53:22 proxmox kernel: [ 938.373724] nvme 0000:01:00.0: device [1d97:5216] error status/mask=00000041/0000e000
Mar 5 16:53:22 proxmox kernel: [ 938.373725] nvme 0000:01:00.0: [ 0] RxErr
Mar 5 16:53:22 proxmox kernel: [ 938.373727] nvme 0000:01:00.0: [ 6] BadTLP
Mar 5 16:53:22 proxmox kernel: [ 938.373728] nvme 0000:01:00.0: AER: Error of this Agent is reported first
Mar 5 16:53:22 proxmox kernel: [ 938.374013] pcieport 0000:00:1b.0: AER: Multiple Corrected error received: 0000:01:00.0
Mar 5 16:53:22 proxmox kernel: [ 938.374021] pcieport 0000:00:1b.0: AER: Multiple Corrected error received: 0000:01:00.0
Mar 5 16:53:22 proxmox kernel: [ 938.374028] pcieport 0000:00:1b.0: AER: Multiple Corrected error received: 0000:01:00.0
Mar 5 16:53:22 proxmox kernel: [ 938.374033] pcieport 0000:00:1b.0: AER: Multiple Corrected error received: 0000:01:00.0
Mar 5 16:53:22 proxmox kernel: [ 938.576182] pcieport 0000:00:1b.0: AER: Multiple Corrected error received: 0000:01:00.0
Mar 5 16:53:22 proxmox kernel: [ 938.576190] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 5 16:53:22 proxmox kernel: [ 938.576192] nvme 0000:01:00.0: device [1d97:5216] error status/mask=00000001/0000e000
Mar 5 16:53:22 proxmox kernel: [ 938.576194] nvme 0000:01:00.0: [ 0] RxErr
Mar 5 16:53:24 proxmox kernel: [ 940.084899] pcieport 0000:00:1b.0: AER: Multiple Corrected error received: 0000:01:00.0
Mar 5 16:53:24 proxmox kernel: [ 940.084907] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 5 16:53:24 proxmox kernel: [ 940.084909] nvme 0000:01:00.0: device [1d97:5216] error status/mask=00000001/0000e000
Mar 5 16:53:24 proxmox kernel: [ 940.084911] nvme 0000:01:00.0: [ 0] RxErr
Mar 5 16:53:24 proxmox kernel: [ 940.085467] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
Mar 5 16:53:24 proxmox kernel: [ 940.085472] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 5 16:53:24 proxmox kernel: [ 940.085474] nvme 0000:01:00.0: device [1d97:5216] error status/mask=00000001/0000e000
Mar 5 16:53:24 proxmox kernel: [ 940.085489] nvme 0000:01:00.0: [ 0] RxErr
Mar 5 16:53:25 proxmox kernel: [ 941.771594] pcieport 0000:00:1b.0: AER: Multiple Corrected error received: 0000:01:00.0
Mar 5 16:53:25 proxmox kernel: [ 941.771613] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Mar 5 16:53:25 proxmox kernel: [ 941.771619] pcieport 0000:00:1b.0: device [8086:a167] error status/mask=00001100/00002000
Mar 5 16:53:25 proxmox kernel: [ 941.771640] pcieport 0000:00:1b.0: [ 8] Rollover
Mar 5 16:53:25 proxmox kernel: [ 941.771641] pcieport 0000:00:1b.0: [12] Timeout
Mar 5 16:53:25 proxmox kernel: [ 941.771644] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Mar 5 16:53:25 proxmox kernel: [ 941.771646] nvme 0000:01:00.0: device [1d97:5216] error status/mask=00000001/0000e000
Mar 5 16:53:25 proxmox kernel: [ 941.771647] nvme 0000:01:00.0: [ 0] RxErr
Mar 5 16:53:25 proxmox kernel: [ 941.771648] nvme 0000:01:00.0: AER: Error of this Agent is reported first
From what I can tell its some sort of error with the PCI bus which the OS is managing to correct.
Is there anything I can do to narrow down the cause more? For example, Is this a software/driver problem or hardware?
Edit: added smartctl output and syslog.
Last edited: