One of my Proxmox Nodes suddenly keeps randomly freezing (Pre and Post PVE8 Upgrade)

Noah0302

Member
Jul 21, 2022
52
7
8
Hello guys,

I've been trying to find the cause of random freezing of one of my Proxmox Nodes for a week now, but I cant find anything, as I havent done something like this up until now.
Here is the output of journalctl -b -1 and the general Syslog before the failure attached.

Also here as a Snippet:
Code:
Jun 26 15:53:09 PVE02 pve-ha-lrm[7248]: <root@pam> end task UPID:PVE02:00001C53:00003BC4:64999844:vzstart:1200003:root@pam: OK
Jun 26 15:53:09 PVE02 pve-ha-lrm[7248]: service status ct:1200003 started
Jun 26 15:53:10 PVE02 kernel: rbd: rbd6: capacity 8589934592 features 0x3d
Jun 26 15:53:10 PVE02 kernel: EXT4-fs (rbd6): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Jun 26 15:53:10 PVE02 audit[7957]: AVC apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-90003_</var/lib/lxc>" pid=7957 comm="apparmor_parser"
Jun 26 15:53:10 PVE02 kernel: audit: type=1400 audit(1687787590.175:26): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-90003_</var/lib/lxc>" pid=7957 comm="apparmor_parser"
Jun 26 15:53:10 PVE02 systemd-udevd[7511]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 26 15:53:10 PVE02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jun 26 15:53:10 PVE02 kernel: fwbr120003i0: port 2(veth120003i0) entered blocking state
Jun 26 15:53:10 PVE02 kernel: fwbr120003i0: port 2(veth120003i0) entered forwarding state
Jun 26 15:53:10 PVE02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jun 26 15:53:10 PVE02 kernel: fwbr1200003i0: port 2(veth1200003i0) entered blocking state
Jun 26 15:53:10 PVE02 kernel: fwbr1200003i0: port 2(veth1200003i0) entered forwarding state
Jun 26 15:53:10 PVE02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jun 26 15:53:10 PVE02 kernel: fwbr1222003i0: port 2(veth1222003i0) entered blocking state
Jun 26 15:53:10 PVE02 kernel: fwbr1222003i0: port 2(veth1222003i0) entered forwarding state
Jun 26 15:53:10 PVE02 sh[2074]: Running command: /usr/sbin/ceph-volume lvm trigger 1-491ff02d-13d3-4f91-bdd4-47e62553b8cc
Jun 26 15:53:10 PVE02 systemd-udevd[7511]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 26 15:53:10 PVE02 systemd-udevd[7511]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 26 15:53:10 PVE02 systemd-udevd[7513]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 26 15:53:10 PVE02 kernel: vmbr1: port 8(fwpr90003p0) entered blocking state
Jun 26 15:53:10 PVE02 kernel: vmbr1: port 8(fwpr90003p0) entered disabled state
Jun 26 15:53:10 PVE02 kernel: device fwpr90003p0 entered promiscuous mode
Jun 26 15:53:10 PVE02 kernel: vmbr1: port 8(fwpr90003p0) entered blocking state
Jun 26 15:53:10 PVE02 kernel: vmbr1: port 8(fwpr90003p0) entered forwarding state
Jun 26 15:53:10 PVE02 kernel: fwbr90003i0: port 1(fwln90003i0) entered blocking state
Jun 26 15:53:10 PVE02 kernel: fwbr90003i0: port 1(fwln90003i0) entered disabled state
Jun 26 15:53:10 PVE02 kernel: device fwln90003i0 entered promiscuous mode
Jun 26 15:53:10 PVE02 kernel: fwbr90003i0: port 1(fwln90003i0) entered blocking state
Jun 26 15:53:10 PVE02 kernel: fwbr90003i0: port 1(fwln90003i0) entered forwarding state
Jun 26 15:53:10 PVE02 kernel: fwbr90003i0: port 2(veth90003i0) entered blocking state
Jun 26 15:53:10 PVE02 kernel: fwbr90003i0: port 2(veth90003i0) entered disabled state
Jun 26 15:53:10 PVE02 kernel: device veth90003i0 entered promiscuous mode
Jun 26 15:53:10 PVE02 kernel: eth0: renamed from vethGiqy94
Jun 26 15:53:10 PVE02 pve-ha-lrm[7638]: <root@pam> end task UPID:PVE02:00001DDE:00003C29:64999845:vzstart:90003:root@pam: OK
Jun 26 15:53:10 PVE02 pve-ha-lrm[7638]: service status ct:90003 started
Jun 26 15:53:11 PVE02 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jun 26 15:53:11 PVE02 kernel: fwbr90003i0: port 2(veth90003i0) entered blocking state
Jun 26 15:53:11 PVE02 kernel: fwbr90003i0: port 2(veth90003i0) entered forwarding state
Jun 26 15:53:15 PVE02 pvestatd[2249]: modified cpu set for lxc/110003: 0,2
Jun 26 15:53:15 PVE02 pvestatd[2249]: modified cpu set for lxc/1200003: 0,4
Jun 26 15:53:15 PVE02 pvestatd[2249]: modified cpu set for lxc/120003: 3-4
Jun 26 15:53:15 PVE02 pvestatd[2249]: modified cpu set for lxc/1222003: 5,7
Jun 26 15:53:15 PVE02 sh[2074]: Running command: /usr/sbin/ceph-volume lvm trigger 1-491ff02d-13d3-4f91-bdd4-47e62553b8cc
Jun 26 15:53:20 PVE02 systemd[1]: ceph-volume@lvm-1-491ff02d-13d3-4f91-bdd4-47e62553b8cc.service: Succeeded.
Jun 26 15:53:20 PVE02 systemd[1]: Finished Ceph Volume activation: lvm-1-491ff02d-13d3-4f91-bdd4-47e62553b8cc.
Jun 26 15:53:20 PVE02 systemd[1]: ceph-volume@lvm-1-491ff02d-13d3-4f91-bdd4-47e62553b8cc.service: Consumed 4.141s CPU time.
Jun 26 15:53:20 PVE02 systemd[1]: Reached target Multi-User System.
Jun 26 15:53:20 PVE02 systemd[1]: Reached target Graphical Interface.
Jun 26 15:53:20 PVE02 systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jun 26 15:53:20 PVE02 systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Jun 26 15:53:20 PVE02 systemd[1]: Finished Update UTMP about System Runlevel Changes.
Jun 26 15:53:20 PVE02 systemd[1]: Startup finished in 5.240s (kernel) + 2min 40.147s (userspace) = 2min 45.387s.

Code:
Jul 02 13:14:23 PVE02 pvestatd[2400]: storage 'TrueNAS-Backup' is not online
Jul 02 13:14:23 PVE02 pvestatd[2400]: status update time (10.487 seconds)
Jul 02 13:14:45 PVE02 systemd[1]: Starting apt-daily.service - Daily apt download activities...
Jul 02 13:14:45 PVE02 systemd[1]: apt-daily.service: Deactivated successfully.
Jul 02 13:14:45 PVE02 systemd[1]: Finished apt-daily.service - Daily apt download activities.
Jul 02 13:17:01 PVE02 CRON[703064]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 02 13:17:01 PVE02 CRON[703065]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 02 13:17:01 PVE02 CRON[703064]: pam_unix(cron:session): session closed for user root
Jul 02 13:25:47 PVE02 pmxcfs[1930]: [dcdb] notice: data verification successful
Jul 02 13:42:43 PVE02 pvestatd[2400]: storage 'TrueNAS-Backup' is not online
Jul 02 13:42:44 PVE02 pvestatd[2400]: status update time (10.486 seconds)
Jul 02 13:58:04 PVE02 pvestatd[2400]: storage 'TrueNAS-ISO' is not online
Jul 02 13:58:05 PVE02 pvestatd[2400]: status update time (10.487 seconds)
Jul 02 14:00:05 PVE02 pvestatd[2400]: storage 'TrueNAS-Backup' is not online
Jul 02 14:00:06 PVE02 pvestatd[2400]: status update time (10.485 seconds)
Jul 02 14:17:01 PVE02 CRON[772488]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 02 14:17:01 PVE02 CRON[772489]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 02 14:17:01 PVE02 CRON[772488]: pam_unix(cron:session): session closed for user root
Jul 02 14:25:47 PVE02 pmxcfs[1930]: [dcdb] notice: data verification successful
Jul 02 14:28:41 PVE02 smartd[1363]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 36 to 38
Jul 02 14:59:46 PVE02 pvestatd[2400]: storage 'TrueNAS-Backup' is not online
Jul 02 14:59:47 PVE02 pvestatd[2400]: status update time (10.485 seconds)
Jul 02 15:17:01 PVE02 CRON[841416]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 02 15:17:01 PVE02 CRON[841417]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 02 15:17:01 PVE02 CRON[841416]: pam_unix(cron:session): session closed for user root
Jul 02 15:17:27 PVE02 pvestatd[2400]: storage 'TrueNAS-Backup' is not online
Jul 02 15:17:27 PVE02 pvestatd[2400]: status update time (10.489 seconds)
Jul 02 15:18:32 PVE02 chronyd[1806]: Selected source 162.159.200.1 (2.debian.pool.ntp.org)
Jul 02 15:25:47 PVE02 pmxcfs[1930]: [dcdb] notice: data verification successful
Jul 02 15:28:41 PVE02 smartd[1363]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 38 to 36
Jul 02 16:17:01 PVE02 CRON[910638]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 02 16:17:01 PVE02 CRON[910639]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 02 16:17:01 PVE02 CRON[910638]: pam_unix(cron:session): session closed for user root
Jul 02 16:25:47 PVE02 pmxcfs[1930]: [dcdb] notice: data verification successful
Jul 02 16:28:37 PVE02 pvestatd[2400]: storage 'TrueNAS-ISO' is not online
Jul 02 16:28:37 PVE02 pvestatd[2400]: status update time (10.490 seconds)
Jul 02 17:00:17 PVE02 pvestatd[2400]: storage 'TrueNAS-ISO' is not online
Jul 02 17:00:18 PVE02 pvestatd[2400]: status update time (10.506 seconds)
Jul 02 17:14:48 PVE02 pvestatd[2400]: storage 'TrueNAS-ISO' is not online
Jul 02 17:14:48 PVE02 pvestatd[2400]: status update time (10.486 seconds)
Jul 02 17:17:01 PVE02 CRON[980149]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 02 17:17:01 PVE02 CRON[980150]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 02 17:17:01 PVE02 CRON[980149]: pam_unix(cron:session): session closed for user root
Jul 02 17:25:47 PVE02 pmxcfs[1930]: [dcdb] notice: data verification successful

Code:
2023-07-02T15:28:41.581068+02:00 PVE02 smartd[1363]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 38 to 36
2023-07-02T16:17:01.595369+02:00 PVE02 CRON[910639]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
2023-07-02T16:25:47.819416+02:00 PVE02 pmxcfs[1930]: [dcdb] notice: data verification successful
2023-07-02T16:28:37.591513+02:00 PVE02 pvestatd[2400]: storage 'TrueNAS-ISO' is not online
2023-07-02T16:28:37.736826+02:00 PVE02 pvestatd[2400]: status update time (10.490 seconds)
2023-07-02T17:00:17.976571+02:00 PVE02 pvestatd[2400]: storage 'TrueNAS-ISO' is not online
2023-07-02T17:00:18.232189+02:00 PVE02 pvestatd[2400]: status update time (10.506 seconds)
2023-07-02T17:14:48.613882+02:00 PVE02 pvestatd[2400]: storage 'TrueNAS-ISO' is not online
2023-07-02T17:14:48.759299+02:00 PVE02 pvestatd[2400]: status update time (10.486 seconds)
2023-07-02T17:17:01.608748+02:00 PVE02 CRON[980150]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
2023-07-02T17:25:47.074377+02:00 PVE02 pmxcfs[1930]: [dcdb] notice: data verification successful
2023-07-02T20:06:27.970646+02:00 PVE02 systemd-modules-load[753]: Inserted module 'vhost_net'
2023-07-02T20:06:27.970742+02:00 PVE02 lvm[739]:   1 logical volume(s) in volume group "ceph-36ef4ba1-220d-4201-81db-39cd40986e6c" monitored
2023-07-02T20:06:27.970757+02:00 PVE02 systemd[1]: Starting systemd-journal-flush.service - Flush Journal to Persistent Storage...
2023-07-02T20:06:27.970765+02:00 PVE02 systemd-udevd[834]: Using default interface naming scheme 'v252'.
2023-07-02T20:06:27.970770+02:00 PVE02 systemd[1]: Started systemd-udevd.service - Rule-based Manager for Device Events and Files.
2023-07-02T20:06:27.970777+02:00 PVE02 systemd[1]: Finished systemd-udev-trigger.service - Coldplug All udev Devices.
2023-07-02T20:06:27.970784+02:00 PVE02 systemd[1]: Finished systemd-journal-flush.service - Flush Journal to Persistent Storage.
2023-07-02T20:06:27.970795+02:00 PVE02 systemd[1]: Finished lvm2-monitor.service - Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
2023-07-02T20:06:27.970812+02:00 PVE02 lvm[917]: PV /dev/nvme0n1 online, VG ceph-36ef4ba1-220d-4201-81db-39cd40986e6c is complete.
2023-07-02T20:06:27.970797+02:00 PVE02 kernel: [    0.000000] Linux version 6.2.16-3-pve (tom@sbuild) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) ()
2023-07-02T20:06:27.970819+02:00 PVE02 lvm[917]: VG ceph-36ef4ba1-220d-4201-81db-39cd40986e6c finished
2023-07-02T20:06:27.970825+02:00 PVE02 kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.2.16-3-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
2023-07-02T20:06:27.970828+02:00 PVE02 kernel: [    0.000000] KERNEL supported cpus:
2023-07-02T20:06:27.970826+02:00 PVE02 systemd[1]: Reached target local-fs-pre.target - Preparation for Local File Systems.
2023-07-02T20:06:27.970830+02:00 PVE02 kernel: [    0.000000]   Intel GenuineIntel
2023-07-02T20:06:27.970832+02:00 PVE02 kernel: [    0.000000]   AMD AuthenticAMD
2023-07-02T20:06:27.970832+02:00 PVE02 kernel: [    0.000000]   Hygon HygonGenuine
2023-07-02T20:06:27.970834+02:00 PVE02 kernel: [    0.000000]   Centaur CentaurHauls
2023-07-02T20:06:27.970833+02:00 PVE02 systemd[1]: Starting ifupdown2-pre.service - Helper to synchronize boot up for ifupdown...
2023-07-02T20:06:27.970838+02:00 PVE02 kernel: [    0.000000]   zhaoxin   Shanghai
2023-07-02T20:06:27.970838+02:00 PVE02 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
2023-07-02T20:06:27.970840+02:00 PVE02 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
2023-07-02T20:06:27.970842+02:00 PVE02 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
2023-07-02T20:06:27.970842+02:00 PVE02 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
2023-07-02T20:06:27.970844+02:00 PVE02 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
2023-07-02T20:06:27.970844+02:00 PVE02 kernel: [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
2023-07-02T20:06:27.970840+02:00 PVE02 systemd[1]: Starting systemd-udev-settle.service - Wait for udev To Complete Device Initialization...
2023-07-02T20:06:27.970847+02:00 PVE02 kernel: [    0.000000] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
2023-07-02T20:06:27.970848+02:00 PVE02 kernel: [    0.000000] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
2023-07-02T20:06:27.970848+02:00 PVE02 kernel: [    0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
2023-07-02T20:06:27.970850+02:00 PVE02 kernel: [    0.000000] signal: max sigframe size: 2032
2023-07-02T20:06:27.970851+02:00 PVE02 kernel: [    0.000000] BIOS-provided physical RAM map:
2023-07-02T20:06:27.970851+02:00 PVE02 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d3ff] usable
2023-07-02T20:06:27.970849+02:00 PVE02 udevadm[1091]: systemd-udev-settle.service is deprecated. Please fix zfs-import-scan.service, zfs-import-cache.service not to pull it in.

Are there any other Log-Files I might be able to find the issue causing it?
Also I would be tempted to think this is a networking / networking device Issue with all of the logs regarding networking...

Thank you for reading!
 

Attachments

  • PVE02-Crash-02072023.txt
    211.7 KB · Views: 5
  • PVE02-Crash-02072023-Syslog.txt
    231.8 KB · Views: 3
  • PVE02-Crash-26062023.txt
    189 KB · Views: 5
Last edited:
Nothing in the logs and I don't even see the -- Reboot -- message of journalctl when it detects that the system started again. Maybe a power interruption or hardware failure. Does not look like failing disks or memory. Try another power supply, then another motherboard. Test memory just in case. Try another CPU.
 
  • Like
Reactions: Noah0302
Thank you two for the responses!

I will try the "mitigation=off" first and if it still happens Ill swap the PSU and continue monitoring.


Thank you again!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!