Help proxmox 9 periodically not responding.... Had to hard reset

UpAndRun

New Member
Jan 9, 2025
15
1
3
My proxmox 9 was working fine for a couple of week, then, it started not responding periodically. It started during the middle of every night, I can ping the ip, but not responding at all. I disabled all auto updates, it worked for two nights, and today, it just stopped a little after 10 am. This time I can not even ping.

What do I need to do to debug this?

I run journalctl -b -1 -e, here is my out put:

Sep 29 07:52:48 minipc pvedaemon[1547]: starting 1 worker(s)
Sep 29 07:52:48 minipc pvedaemon[1547]: worker 933117 started
Sep 29 07:53:47 minipc pvedaemon[933117]: <root@pam> successful auth for user 'root@pam'
Sep 29 07:57:07 minipc pvedaemon[899231]: worker exit
Sep 29 07:57:07 minipc pvedaemon[1547]: worker 899231 finished
Sep 29 07:57:07 minipc pvedaemon[1547]: starting 1 worker(s)
Sep 29 07:57:07 minipc pvedaemon[1547]: worker 934788 started
Sep 29 07:58:33 minipc pveproxy[1576]: worker 917617 finished
Sep 29 07:58:33 minipc pveproxy[1576]: starting 1 worker(s)
Sep 29 07:58:33 minipc pveproxy[935312]: got inotify poll request in wrong process - disabling inotify
Sep 29 07:58:33 minipc pveproxy[1576]: worker 935315 started
Sep 29 07:58:34 minipc pveproxy[935312]: worker exit
Sep 29 07:59:45 minipc pveproxy[917734]: worker exit
Sep 29 07:59:45 minipc pveproxy[1576]: worker 917734 finished
Sep 29 07:59:45 minipc pveproxy[1576]: starting 1 worker(s)
Sep 29 07:59:45 minipc pveproxy[1576]: worker 935795 started
Sep 29 08:04:12 minipc pmxcfs[1327]: [dcdb] notice: data verification successful
Sep 29 08:04:40 minipc pveproxy[919559]: worker exit
Sep 29 08:04:40 minipc pveproxy[1576]: worker 919559 finished
Sep 29 08:04:40 minipc pveproxy[1576]: starting 1 worker(s)
Sep 29 08:04:40 minipc pveproxy[1576]: worker 937631 started
Sep 29 08:09:47 minipc pvedaemon[931469]: <root@pam> successful auth for user 'root@pam'
Sep 29 08:17:01 minipc CRON[942735]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Sep 29 08:17:01 minipc CRON[942737]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 29 08:17:01 minipc CRON[942735]: pam_unix(cron:session): session closed for user root
Sep 29 08:51:01 minipc pvedaemon[931469]: <root@pam> successful auth for user 'homeassistant@pve'
Sep 29 09:04:12 minipc pmxcfs[1327]: [dcdb] notice: data verification successful
Sep 29 09:17:01 minipc CRON[962440]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Sep 29 09:17:01 minipc CRON[962442]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 29 09:17:01 minipc CRON[962440]: pam_unix(cron:session): session closed for user root
Sep 29 09:52:01 minipc pvedaemon[931469]: <root@pam> successful auth for user 'homeassistant@pve'
Sep 29 10:04:12 minipc pmxcfs[1327]: [dcdb] notice: data verification successful
 
I setup a two nodes cluster, then manually disabled it. But the log still have the following, maybe that is the root cause?

I also updated to the latest kernel.

Sep 29 20:35:42 minipc systemd[1]: Finished lxc-net.service - LXC network bridge setup.
Sep 29 20:35:42 minipc systemd[1]: Finished blk-availability.service - Availability of block devices.
Sep 29 20:35:42 minipc systemd[1]: Starting lxc.service - LXC Container Initialization and Autoboot Code...
Sep 29 20:35:42 minipc pmxcfs[1340]: [quorum] crit: quorum_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Sep 29 20:35:42 minipc pmxcfs[1340]: [quorum] crit: can't initialize service
Sep 29 20:35:42 minipc pmxcfs[1340]: [confdb] crit: cmap_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Sep 29 20:35:42 minipc pmxcfs[1340]: [confdb] crit: can't initialize service
Sep 29 20:35:42 minipc pmxcfs[1340]: [dcdb] crit: cpg_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Sep 29 20:35:42 minipc pmxcfs[1340]: [dcdb] crit: can't initialize service
Sep 29 20:35:42 minipc pmxcfs[1340]: [status] crit: cpg_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Sep 29 20:35:42 minipc pmxcfs[1340]: [status] crit: can't initialize service
Sep 29 20:35:42 minipc sshd[1346]: Server listening on 0.0.0.0 port 22.

and I see these too:

Sep 29 20:35:43 minipc (corosync)[1487]: corosync.service: Referenced but unset environment variable evaluates to an empty string: COROSYNC_OPTIONS
Sep 29 20:35:43 minipc cron[1488]: (CRON) INFO (Running @reboot jobs)
Sep 29 20:35:43 minipc corosync[1487]: [MAIN ] Corosync Cluster Engine starting up
Sep 29 20:35:43 minipc corosync[1487]: [MAIN ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim nozzle snmp pie >
Sep 29 20:35:43 minipc corosync[1487]: [TOTEM ] Initializing transport (Kronosnet).
Sep 29 20:35:43 minipc kernel: sctp: Hash tables configured (bind 512/512)
Sep 29 20:35:43 minipc corosync[1487]: [TOTEM ] totemknet initialized
Sep 29 20:35:43 minipc corosync[1487]: [KNET ] pmtud: MTU manually set to: 0
Sep 29 20:35:43 minipc corosync[1487]: [KNET ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Sep 29 20:35:43 minipc corosync[1487]: [SERV ] Service engine loaded: corosync configuration map access [0]
Sep 29 20:35:43 minipc corosync[1487]: [QB ] server name: cmap
Sep 29 20:35:43 minipc corosync[1487]: [SERV ] Service engine loaded: corosync configuration service [1]
Sep 29 20:35:43 minipc corosync[1487]: [QB ] server name: cfg
Sep 29 20:35:43 minipc corosync[1487]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 29 20:35:43 minipc corosync[1487]: [QB ] server name: cpg
Sep 29 20:35:43 minipc corosync[1487]: [SERV ] Service engine loaded: corosync profile loading service [4]
Sep 29 20:35:43 minipc corosync[1487]: [SERV ] Service engine loaded: corosync resource monitoring service [6]
Sep 29 20:35:43 minipc corosync[1487]: [WD ] Watchdog not enabled by configuration
Sep 29 20:35:43 minipc corosync[1487]: [WD ] resource load_15min missing a recovery key.
Sep 29 20:35:43 minipc corosync[1487]: [WD ] resource memory_used missing a recovery key.
Sep 29 20:35:43 minipc corosync[1487]: [WD ] no resources configured.
Sep 29 20:35:43 minipc corosync[1487]: [SERV ] Service engine loaded: corosync watchdog service [7]
Sep 29 20:35:43 minipc corosync[1487]: [QUORUM] Using quorum provider corosync_votequorum
Sep 29 20:35:43 minipc corosync[1487]: [QUORUM] This node is within the primary component and will provide service.
Sep 29 20:35:43 minipc corosync[1487]: [QUORUM] Members[0]:
Sep 29 20:35:43 minipc corosync[1487]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Sep 29 20:35:43 minipc corosync[1487]: [QB ] server name: votequorum
Sep 29 20:35:43 minipc corosync[1487]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 29 20:35:43 minipc corosync[1487]: [QB ] server name: quorum
Sep 29 20:35:43 minipc corosync[1487]: [TOTEM ] Configuring link 0
Sep 29 20:35:43 minipc corosync[1487]: [TOTEM ] Configured link number 0: local addr: 192.168.1.4, port=5405
Sep 29 20:35:43 minipc corosync[1487]: [KNET ] link: Resetting MTU for link 0 because host 1 joined
Sep 29 20:35:43 minipc corosync[1487]: [QUORUM] Sync members[1]: 1
Sep 29 20:35:43 minipc corosync[1487]: [QUORUM] Sync joined[1]: 1
Sep 29 20:35:43 minipc corosync[1487]: [TOTEM ] A new membership (1.bc) was formed. Members joined: 1
Sep 29 20:35:43 minipc corosync[1487]: [QUORUM] Members[1]: 1
Sep 29 20:35:43 minipc corosync[1487]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 29 20:35:43 minipc systemd[1]: Started corosync.service - Corosync Cluster Engine.
Sep 29 20:35:43 minipc systemd[1]: Starting pve-firewall-commit.service - Commit Proxmox VE Firewall changes...