Hello,
I am very new to proxmox and loving it so far BUT after running for about a week I have had 2 crashes in the last 36 hours which requires a reboot. What happens is I notice the web interface is no longer available and any machines running off of the nvme drive stop working. The first time it crashed I could not retrieve any errors. The second time I was able to capture the below messages from syslog. Any of the virtual machines that were running on the non-nvme drive stayed running. Any help would be appreciated, please let me know what additional information would be helpful in troubleshooting. The machine that is being used is a new machine. Since this is a new server, at the time I only had a Windows 10 VM and 2 Ubuntu containers running.
Error from syslog:
I am very new to proxmox and loving it so far BUT after running for about a week I have had 2 crashes in the last 36 hours which requires a reboot. What happens is I notice the web interface is no longer available and any machines running off of the nvme drive stop working. The first time it crashed I could not retrieve any errors. The second time I was able to capture the below messages from syslog. Any of the virtual machines that were running on the non-nvme drive stayed running. Any help would be appreciated, please let me know what additional information would be helpful in troubleshooting. The machine that is being used is a new machine. Since this is a new server, at the time I only had a Windows 10 VM and 2 Ubuntu containers running.
Error from syslog:
Code:
Jan 15 04:05:51 proxsrv kernel: [57859.268892] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f15 source:0x0100
Jan 15 04:05:51 proxsrv kernel: [57859.268893] pcieport 0000:00:1b.0: DPC: ERR_FATAL detected
Jan 15 04:05:51 proxsrv kernel: [57859.268894] nvme
Jan 15 04:05:51 proxsrv kernel: [57859.326043] pcieport 0000:00:1b.0: AER: Multiple Corrected error received: 0000:01:00.0
Jan 15 04:05:51 proxsrv kernel: [57859.326130] nvme 0000:01:00.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Jan 15 04:05:51 proxsrv kernel: [57859.326133] nvme 0000:01:00.0: AER: device [1344:5410] error status/mask=00001040/00002000
Jan 15 04:05:51 proxsrv kernel: [57859.326134] nvme 0000:01:00.0: AER: [ 6] BadTLP
Jan 15 04:05:51 proxsrv kernel: [57859.326135] nvme 0000:01:00.0: AER: [12] Timeout
Jan 15 04:05:51 proxsrv kernel: [57859.444612] nvme nvme0: restart after slot reset
Jan 15 04:05:51 proxsrv kernel: [57859.722985] nvme nvme0: 16/0/0 default/read/poll queues
Jan 15 04:05:51 proxsrv kernel: [57859.756970] pcieport 0000:00:1b.0: AER: Device recovery successful
Jan 15 04:05:51 proxsrv kernel: [57859.756974] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0100
Jan 15 04:05:51 proxsrv kernel: [57859.756975] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
Jan 15 04:05:51 proxsrv kernel: [57859.756979] nvme nvme0: frozen state error detected, reset controller
Jan 15 04:05:53 proxsrv kernel: [57861.856549] pcieport 0000:00:1b.0: Data Li
Last edited: