NVME correctable errors?

abhabitat

New Member
Oct 1, 2024
1
0
1
My proxmox install has two NVME drives and in the System Log I see the same message repeating over and over.
Code:
Mar 06 18:00:35 proxmox kernel: pcieport 0000:00:01.2: AER: Correctable error message received from 0000:03:00.0
Mar 06 18:00:35 proxmox kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
Mar 06 18:00:35 proxmox kernel: nvme 0000:03:00.0:   device [15b7:5006] error status/mask=00000001/0000e000
Mar 06 18:00:35 proxmox kernel: nvme 0000:03:00.0:    [ 0] RxErr                  (First)
Mar 06 18:00:35 proxmox kernel: pcieport 0000:00:01.2: AER: Correctable error message received from 0000:03:00.0
Mar 06 18:00:35 proxmox kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
Mar 06 18:00:35 proxmox kernel: nvme 0000:03:00.0:   device [15b7:5006] error status/mask=00000001/0000e000
Mar 06 18:00:35 proxmox kernel: nvme 0000:03:00.0:    [ 0] RxErr                  (First)
Mar 06 18:00:35 proxmox kernel: pcieport 0000:00:01.2: AER: Correctable error message received from 0000:03:00.0
Mar 06 18:00:35 proxmox kernel: nvme 0000:03:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
Mar 06 18:00:35 proxmox kernel: nvme 0000:03:00.0:   device [15b7:5006] error status/mask=00000001/0000e000
Mar 06 18:00:35 proxmox kernel: nvme 0000:03:00.0:    [ 0] RxErr                  (First)
First, is this an issue or something I can ignore?
How can I tell which NVME drive this is referring to?
 
How can I tell which NVME drive this is referring to?
It's PCI(e) device 03:00.0. You can find the dev-node by looking at the output of ls -l /dev/disk/by-path/pci-0000:03:00.0*. and then look up that dev-node in the output of ls -l /dev/disk/by-id/nvme* to find the make, model and serial number of the NVMe.

First, is this an issue or something I can ignore?
You system log might fill your entire drive, which is bad. It might also indicate a problem, Re-seat the drive? Update motherboard BIOS and drive firmware? Reduce the PCIe speed?
I think I might have seen an error like it when my GPU was overheating because the fan failed. I don't really know how bad this is, sorry.
 
@abhabitat : you didn't search in the forum, right ?
;)