AMD Ryzen Proxmox keeps crashing

mrwiga

New Member
May 14, 2024
4
1
3
Hi,

My proxmox keeps on randomly crashing.
Does below mean anything?

Code:
journalctl -b -1 -r

Screenshot 2024-05-21 at 10.17.36 AM.png

Log before it crashes

Code:
ay 20 23:21:19 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:21:46 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:21:46 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:21:46 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:21:46 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:24:07 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:24:07 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:24:07 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:24:07 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:24:43 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:24:43 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:24:43 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:24:43 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:25:02 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:25:02 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:25:02 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:25:02 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:25:35 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:25:35 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:25:35 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:25:35 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:25:38 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:25:38 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:25:38 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:25:38 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:26:13 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:26:13 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:26:13 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:26:13 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:26:23 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:26:23 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:26:23 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:26:23 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:26:25 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:26:25 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:26:25 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:26:25 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:26:28 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:26:28 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:26:28 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:26:28 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:26:39 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:26:39 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:26:39 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:26:39 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:29:18 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:29:18 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:29:18 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:29:18 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:30:07 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:30:07 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:30:07 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:30:07 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:30:58 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:30:58 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:30:58 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:30:58 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:31:15 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:31:15 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:31:15 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:31:15 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:31:28 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:31:28 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:31:28 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:31:28 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:31:34 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:31:34 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:31:34 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:31:34 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:31:37 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:31:37 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:31:37 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:31:37 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:31:39 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:31:39 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:31:39 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:31:39 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:33:22 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:33:22 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:33:22 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:33:22 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:34:21 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:34:21 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:34:21 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:34:21 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:34:33 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:34:33 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:34:33 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:34:33 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:35:56 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:35:56 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:35:56 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:35:56 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:37:35 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:37:35 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:37:35 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:37:35 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:37:37 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:37:37 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:37:37 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:37:37 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
May 20 23:37:47 pve kernel: pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
May 20 23:37:47 pve kernel: nvme 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 20 23:37:47 pve kernel: nvme 0000:01:00.0:   device [144d:a804] error status/mask=00000001/00006000
May 20 23:37:47 pve kernel: nvme 0000:01:00.0:    [ 0] RxErr                  (First)
-- Reboot --
 
Last edited:
My proxmox keeps on randomly crashing.
Does below mean anything?
the logs only show 'correctable' errors but the amount could indicate some problem related to it

since the device is seemingly an nvme drive, my guess is that is has a problem and the host crashes because it can't read/write to disk anymore?
i'd try a different device or check if there's a firmware upgrade for it
 
the logs only show 'correctable' errors but the amount could indicate some problem related to it

since the device is seemingly an nvme drive, my guess is that is has a problem and the host crashes because it can't read/write to disk anymore?
i'd try a different device or check if there's a firmware upgrade for it
Thanks. I have started to remove one of the nvme device
 
Still crashes
Code:
May 21 20:55:58 pve systemd[1]: Reloading.
May 21 20:55:59 pve systemd[1]: Mounting mnt-pve-nvme\x2d1Tb\x2dsecond.mount - Mount storage 'nvme-1Tb-second' under /mnt/pve...
May 21 20:55:59 pve kernel: EXT4-fs (sda1): mounted filesystem edba2766-0b51-4bde-b7fd-c507c222a30c r/w with ordered data mode. Quota mode: none.
May 21 20:55:59 pve systemd[1]: Mounted mnt-pve-nvme\x2d1Tb\x2dsecond.mount - Mount storage 'nvme-1Tb-second' under /mnt/pve.
May 21 20:55:59 pve pvedaemon[10181]: <root@pam> end task UPID:pve:00002FA1:0005E75E:664C7D86:dircreate:nvme-1Tb-second:root@pam: OK
May 21 20:58:53 pve pvedaemon[10425]: <root@pam> starting task UPID:pve:000032CF:00064161:664C7E6D:qmrestore:101:root@pam:
May 21 20:58:56 pve pvedaemon[10425]: <root@pam> end task UPID:pve:000032CF:00064161:664C7E6D:qmrestore:101:root@pam: OK
May 21 20:59:32 pve pvedaemon[13161]: start VM 101: UPID:pve:00003369:000650A0:664C7E94:qmstart:101:root@pam:
May 21 20:59:32 pve pvedaemon[10425]: <root@pam> starting task UPID:pve:00003369:000650A0:664C7E94:qmstart:101:root@pam:
May 21 20:59:32 pve pvedaemon[13161]: storage 'Data_Drive' does not exist
May 21 20:59:32 pve pvedaemon[10425]: <root@pam> end task UPID:pve:00003369:000650A0:664C7E94:qmstart:101:root@pam: storage 'Data_Drive' does not exist
May 21 20:59:45 pve pvedaemon[1227]: <root@pam> update VM 101: -delete ide2
May 21 20:59:52 pve pvedaemon[13229]: start VM 101: UPID:pve:000033AD:00065851:664C7EA8:qmstart:101:root@pam:
May 21 20:59:52 pve pvedaemon[10181]: <root@pam> starting task UPID:pve:000033AD:00065851:664C7EA8:qmstart:101:root@pam:
May 21 20:59:52 pve systemd[1]: Created slice qemu.slice - Slice /qemu.
May 21 20:59:52 pve systemd[1]: Started 101.scope.
May 21 20:59:53 pve kernel: tap101i0: entered promiscuous mode
May 21 20:59:53 pve kernel: vmbr0: port 2(fwpr101p0) entered blocking state
May 21 20:59:53 pve kernel: vmbr0: port 2(fwpr101p0) entered disabled state
May 21 20:59:53 pve kernel: fwpr101p0: entered allmulticast mode
May 21 20:59:53 pve kernel: fwpr101p0: entered promiscuous mode
May 21 20:59:53 pve kernel: vmbr0: port 2(fwpr101p0) entered blocking state
May 21 20:59:53 pve kernel: vmbr0: port 2(fwpr101p0) entered forwarding state
May 21 20:59:53 pve kernel: fwbr101i0: port 1(fwln101i0) entered blocking state
May 21 20:59:53 pve kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
May 21 20:59:53 pve kernel: fwln101i0: entered allmulticast mode
May 21 20:59:53 pve kernel: fwln101i0: entered promiscuous mode
May 21 20:59:53 pve kernel: fwbr101i0: port 1(fwln101i0) entered blocking state
May 21 20:59:53 pve kernel: fwbr101i0: port 1(fwln101i0) entered forwarding state
May 21 20:59:53 pve kernel: fwbr101i0: port 2(tap101i0) entered blocking state
May 21 20:59:53 pve kernel: fwbr101i0: port 2(tap101i0) entered disabled state
May 21 20:59:53 pve kernel: tap101i0: entered allmulticast mode
May 21 20:59:53 pve kernel: fwbr101i0: port 2(tap101i0) entered blocking state
May 21 20:59:53 pve kernel: fwbr101i0: port 2(tap101i0) entered forwarding state
May 21 20:59:53 pve pvedaemon[13337]: starting vnc proxy UPID:pve:00003419:000658CC:664C7EA9:vncproxy:101:root@pam:
May 21 20:59:53 pve pvedaemon[10181]: <root@pam> starting task UPID:pve:00003419:000658CC:664C7EA9:vncproxy:101:root@pam:
May 21 20:59:53 pve pveproxy[10510]: proxy detected vanished client connection
May 21 20:59:53 pve pvedaemon[10181]: <root@pam> end task UPID:pve:000033AD:00065851:664C7EA8:qmstart:101:root@pam: OK
May 21 20:59:53 pve pvedaemon[10425]: <root@pam> starting task UPID:pve:0000341B:000658D3:664C7EA9:vncproxy:101:root@pam:
May 21 20:59:53 pve pvedaemon[13339]: starting vnc proxy UPID:pve:0000341B:000658D3:664C7EA9:vncproxy:101:root@pam:
May 21 21:00:03 pve pvedaemon[13337]: connection timed out
May 21 21:00:03 pve pvedaemon[10181]: <root@pam> end task UPID:pve:00003419:000658CC:664C7EA9:vncproxy:101:root@pam: connection timed out
May 21 21:00:05 pve pvedaemon[10181]: <root@pam> starting task UPID:pve:00003451:00065D77:664C7EB5:qmstop:101:root@pam:
May 21 21:00:05 pve pvedaemon[13393]: stop VM 101: UPID:pve:00003451:00065D77:664C7EB5:qmstop:101:root@pam:
May 21 21:00:05 pve kernel: tap101i0: left allmulticast mode
May 21 21:00:05 pve kernel: fwbr101i0: port 2(tap101i0) entered disabled state
May 21 21:00:05 pve kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
May 21 21:00:05 pve kernel: vmbr0: port 2(fwpr101p0) entered disabled state
May 21 21:00:05 pve kernel: fwln101i0 (unregistering): left allmulticast mode
May 21 21:00:05 pve kernel: fwln101i0 (unregistering): left promiscuous mode
May 21 21:00:05 pve kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
May 21 21:00:05 pve kernel: fwpr101p0 (unregistering): left allmulticast mode
May 21 21:00:05 pve kernel: fwpr101p0 (unregistering): left promiscuous mode
May 21 21:00:05 pve kernel: vmbr0: port 2(fwpr101p0) entered disabled state
May 21 21:00:05 pve qmeventd[875]: read: Connection reset by peer
May 21 21:00:05 pve pvedaemon[1227]: VM 101 qmp command failed - VM 101 not running
May 21 21:00:05 pve pvedaemon[10181]: <root@pam> end task UPID:pve:00003451:00065D77:664C7EB5:qmstop:101:root@pam: OK
May 21 21:00:05 pve pveproxy[11389]: problem with client ::ffff:10.100.8.5; Connection reset by peer
May 21 21:00:05 pve pvedaemon[10425]: <root@pam> end task UPID:pve:0000341B:000658D3:664C7EA9:vncproxy:101:root@pam: OK
May 21 21:00:05 pve systemd[1]: 101.scope: Deactivated successfully.
May 21 21:00:05 pve systemd[1]: 101.scope: Consumed 4.853s CPU time.
May 21 21:00:06 pve qmeventd[13424]: Starting cleanup for 101
May 21 21:00:06 pve qmeventd[13424]: Finished cleanup for 101
May 21 21:00:22 pve pvedaemon[1227]: <root@pam> starting task UPID:pve:0000348F:00066421:664C7EC6:qmdestroy:101:root@pam:
May 21 21:00:22 pve pvedaemon[13455]: destroy VM 101: UPID:pve:0000348F:00066421:664C7EC6:qmdestroy:101:root@pam:
May 21 21:00:22 pve pvedaemon[1227]: <root@pam> end task UPID:pve:0000348F:00066421:664C7EC6:qmdestroy:101:root@pam: OK
May 21 21:00:38 pve pvedaemon[10425]: <root@pam> starting task UPID:pve:000034D4:00066A95:664C7ED6:imgdel:101@PixelNAS:root@pam:
May 21 21:00:38 pve pvedaemon[10425]: <root@pam> end task UPID:pve:000034D4:00066A95:664C7ED6:imgdel:101@PixelNAS:root@pam: OK
May 21 21:01:01 pve cron[1189]: (*system*vzdump) RELOAD (/etc/cron.d/vzdump)
May 21 21:06:06 pve pvedaemon[1227]: <root@pam> successful auth for user 'root@pam'
May 21 21:06:34 pve pvedaemon[1227]: worker exit
May 21 21:06:34 pve pvedaemon[1226]: worker 1227 finished
May 21 21:06:34 pve pvedaemon[1226]: starting 1 worker(s)
May 21 21:06:34 pve pvedaemon[1226]: worker 14639 started
May 21 21:07:43 pve pveproxy[10104]: worker exit
May 21 21:07:43 pve pveproxy[1235]: worker 10104 finished
May 21 21:07:43 pve pveproxy[1235]: starting 1 worker(s)
May 21 21:07:43 pve pveproxy[1235]: worker 14857 started
May 21 21:07:44 pve pveproxy[10510]: worker exit
May 21 21:07:44 pve pveproxy[1235]: worker 10510 finished
May 21 21:07:44 pve pveproxy[1235]: starting 1 worker(s)
May 21 21:07:44 pve pveproxy[1235]: worker 14858 started
May 21 21:09:06 pve pvedaemon[10425]: <root@pam> successful auth for user 'root@pam'
May 21 21:17:01 pve CRON[16611]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 21 21:17:01 pve CRON[16612]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 21 21:17:01 pve CRON[16611]: pam_unix(cron:session): session closed for user root
May 21 21:17:27 pve pveproxy[11389]: worker exit
May 21 21:17:27 pve pveproxy[1235]: worker 11389 finished
May 21 21:17:27 pve pveproxy[1235]: starting 1 worker(s)
May 21 21:17:27 pve pveproxy[1235]: worker 16706 started
May 21 21:20:41 pve smartd[874]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 73 to 70
May 21 21:20:41 pve smartd[874]: Device: /dev/sdc [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 58 to 57
May 21 21:22:06 pve pvedaemon[10425]: <root@pam> successful auth for user 'root@pam'
May 21 21:25:06 pve pvedaemon[10181]: <root@pam> successful auth for user 'root@pam'
May 21 21:31:30 pve pveproxy[14857]: worker exit
May 21 21:31:30 pve pveproxy[1235]: worker 14857 finished
May 21 21:31:30 pve pveproxy[1235]: starting 1 worker(s)
May 21 21:31:30 pve pveproxy[1235]: worker 19331 started
May 21 21:35:12 pve pveproxy[14858]: worker exit
May 21 21:35:12 pve pveproxy[1235]: worker 14858 finished
May 21 21:35:12 pve pveproxy[1235]: starting 1 worker(s)
May 21 21:35:12 pve pveproxy[1235]: worker 20022 started
May 21 21:38:06 pve pvedaemon[10425]: <root@pam> successful auth for user 'root@pam'
May 21 21:39:28 pve pveproxy[16706]: worker exit
May 21 21:39:28 pve pveproxy[1235]: worker 16706 finished
May 21 21:39:28 pve pveproxy[1235]: starting 1 worker(s)
May 21 21:39:28 pve pveproxy[1235]: worker 20834 started
May 21 21:41:06 pve pvedaemon[14639]: <root@pam> successful auth for user 'root@pam'
May 21 21:50:41 pve smartd[874]: Device: /dev/sdc [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 57 to 60
May 21 21:53:46 pve pvedaemon[10425]: <root@pam> successful auth for user 'root@pam'
May 21 21:55:27 pve kernel: pcieport 0000:00:01.3: AER: Correctable error message received from 0000:26:00.0
May 21 21:55:27 pve kernel: xhci_hcd 0000:26:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
May 21 21:55:27 pve kernel: xhci_hcd 0000:26:00.0:   device [1b21:2142] error status/mask=00000001/00002000
May 21 21:55:27 pve kernel: xhci_hcd 0000:26:00.0:    [ 0] RxErr                  (First)
May 21 22:17:01 pve CRON[28010]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 21 22:17:01 pve CRON[28011]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 21 22:17:01 pve CRON[28010]: pam_unix(cron:session): session closed for user root
May 21 22:19:41 pve pvedaemon[14639]: <root@pam> successful auth for user 'root@pam'
May 21 22:20:42 pve smartd[874]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 70 to 71
May 21 22:20:42 pve smartd[874]: Device: /dev/sdc [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 60 to 77
May 21 22:50:41 pve smartd[874]: Device: /dev/sdc [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 77 to 78
May 21 23:17:01 pve CRON[39245]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 21 23:17:01 pve CRON[39246]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 21 23:17:01 pve CRON[39245]: pam_unix(cron:session): session closed for user root
May 21 23:50:41 pve smartd[874]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 74
May 21 23:50:41 pve smartd[874]: Device: /dev/sdc [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 78 to 79
May 21 23:55:01 pve pveproxy[20022]: worker exit
May 21 23:55:01 pve pveproxy[1235]: worker 20022 finished
May 21 23:55:01 pve pveproxy[1235]: starting 1 worker(s)
May 21 23:55:01 pve pveproxy[1235]: worker 46360 started
May 21 23:56:09 pve pveproxy[20834]: worker exit
May 21 23:56:09 pve pveproxy[1235]: worker 20834 finished
May 21 23:56:09 pve pveproxy[1235]: starting 1 worker(s)
May 21 23:56:09 pve pveproxy[1235]: worker 46581 started
May 21 23:57:02 pve pveproxy[19331]: worker exit
May 21 23:57:02 pve pveproxy[1235]: worker 19331 finished
May 21 23:57:02 pve pveproxy[1235]: starting 1 worker(s)
May 21 23:57:02 pve pveproxy[1235]: worker 46740 started
May 21 23:57:29 pve pvedaemon[10181]: worker exit
May 21 23:57:29 pve pvedaemon[1226]: worker 10181 finished
May 21 23:57:29 pve pvedaemon[1226]: starting 1 worker(s)
May 21 23:57:29 pve pvedaemon[1226]: worker 46831 started
May 21 23:58:51 pve pvedaemon[46831]: <root@pam> successful auth for user 'root@pam'
May 22 00:00:13 pve systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
May 22 00:00:13 pve systemd[1]: Starting logrotate.service - Rotate log files...
May 22 00:00:13 pve systemd[1]: logrotate.service: Deactivated successfully.
May 22 00:00:13 pve systemd[1]: Finished logrotate.service - Rotate log files.
May 22 00:00:13 pve systemd[1]: dpkg-db-backup.service: Deactivated successfully.
May 22 00:00:13 pve systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
May 22 00:15:54 pve pvedaemon[46831]: <root@pam> successful auth for user 'root@pam'
May 22 00:17:01 pve CRON[50543]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 22 00:17:01 pve CRON[50544]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 22 00:17:01 pve CRON[50543]: pam_unix(cron:session): session closed for user root
May 22 01:14:12 pve pvedaemon[10425]: <root@pam> successful auth for user 'root@pam'
May 22 01:17:01 pve CRON[61774]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 22 01:17:01 pve CRON[61775]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 22 01:17:01 pve CRON[61774]: pam_unix(cron:session): session closed for user root
May 22 01:33:41 pve pvedaemon[10425]: <root@pam> successful auth for user 'root@pam'
May 22 01:34:13 pve pvedaemon[10425]: worker exit
May 22 01:34:13 pve pvedaemon[1226]: worker 10425 finished
May 22 01:34:13 pve pvedaemon[1226]: starting 1 worker(s)
May 22 01:34:13 pve pvedaemon[1226]: worker 64992 started
May 22 02:11:15 pve pveproxy[46581]: worker exit
May 22 02:11:15 pve pveproxy[1235]: worker 46581 finished
May 22 02:11:15 pve pveproxy[1235]: starting 1 worker(s)
May 22 02:11:15 pve pveproxy[1235]: worker 71934 started
May 22 02:17:01 pve CRON[73010]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 22 02:17:01 pve CRON[73011]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 22 02:17:01 pve CRON[73010]: pam_unix(cron:session): session closed for user root
May 22 02:42:27 pve pvedaemon[14639]: <root@pam> successful auth for user 'root@pam'
May 22 03:01:27 pve pveproxy[46360]: worker exit
May 22 03:01:27 pve pveproxy[1235]: worker 46360 finished
May 22 03:01:27 pve pveproxy[1235]: starting 1 worker(s)
May 22 03:01:27 pve pveproxy[1235]: worker 81341 started
May 22 03:02:21 pve pveproxy[46740]: worker exit
May 22 03:02:21 pve pveproxy[1235]: worker 46740 finished
May 22 03:02:21 pve pveproxy[1235]: starting 1 worker(s)
May 22 03:02:21 pve pveproxy[1235]: worker 81499 started
 
Still crashes
there is nothing in the given logs indicating a crash or any relevant problem
(so the nvme errors maybe were unrelated?)

you could try to boot a different (older) kernel to see if that fixes the issue
alternatively i would guess that it is a hardware issue (e.g. you could try installing a different os on it and see if it crashes)
 
do your self a favour and go back to kernel "6.5.13-5-pve". i had the same problems. when you google problems with the current kernel you see threads all over the web.
 
Last edited:
  • Like
Reactions: justinclift
I have added the below, it's more stable now

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet mitigations=off"
 
  • Like
Reactions: Subsonic

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!