cluster issues: pvestatd and pvedaeamon timeout

erwinvank · Dec 2, 2024

Hi everyone,

Have been going through a lot of posts on this forum for a solution, but none seem to resolve the issue I'm facing complety...

Since a powerfailure our Proxmox cluster (3 nodes) is having issues... pve02 and pve03 work fine after a reboot but pve01 fails to start PVESTATD and PVEDAEMON, they hang for a long time and eventually, after killling "pmxcfs", they show:

Code:

~# journalctl -r -u pvedaemon.service
Dec 02 14:41:33 proxmox01 systemd[1]: pvedaemon.service: start operation timed out. Terminating.
Dec 02 14:40:03 proxmox01 systemd[1]: Starting pvedaemon.service - PVE API Daemon...
Dec 02 14:40:03 proxmox01 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Dec 02 14:40:03 proxmox01 systemd[1]: pvedaemon.service: Found left-over process 1262143 (pvedaemon) in control group while starting unit. Ignoring.
Dec 02 14:40:03 proxmox01 systemd[1]: pvedaemon.service: Consumed 6.576s CPU time.
Dec 02 14:40:03 proxmox01 systemd[1]: Stopped pvedaemon.service - PVE API Daemon.
Dec 02 14:40:03 proxmox01 systemd[1]: pvedaemon.service: Scheduled restart job, restart counter is at 1.
Dec 02 14:40:02 proxmox01 systemd[1]: pvedaemon.service: Consumed 6.576s CPU time.

Code:

# journalctl -r -u pvestatd.service
Dec 02 14:40:55 proxmox01 systemd[1]: pvestatd.service: Consumed 6.171s CPU time.
Dec 02 14:40:55 proxmox01 systemd[1]: Failed to start pvestatd.service - PVE Status Daemon.
Dec 02 14:40:55 proxmox01 systemd[1]: pvestatd.service: Unit process 1262343 (pvestatd) remains running after unit stopped.
Dec 02 14:40:55 proxmox01 systemd[1]: pvestatd.service: Failed with result 'timeout'.
Dec 02 14:40:55 proxmox01 systemd[1]: pvestatd.service: Processes still around after final SIGKILL. Entering failed mode.
Dec 02 14:39:25 proxmox01 systemd[1]: pvestatd.service: Killing process 1262343 (pvestatd) with signal SIGKILL.

I turned on debugging but nothing gives me a clear explanation of what is wrong.
The only analysis so far is that certain directories under "/etc/pve" hang... like /etc/pve/local for example. Hence, kill "pmxcfs" and runing it in "local mode" shows nothing wrong with the directory structure.

Has anyone encountered this before? Thanks!

erwinvank · Dec 9, 2024

The cluster is OK again.

In short, it turned out that it was pve-ha-crm / pve-ha-lrm service related... I was unable to get a dir listing of /etc/pve/local and the service pmxcfs prevented me from restarting these services. It hung.

So, I copy/pasted the following commands in my terminal to execute them faster (example for LRM):

Code:

systemctl stop corosync.service pve-cluster.service
ps ax | grep pmx | cut -d" " -f1 | xargs kill -9
service pve-ha-lrm restart
systemctl start corosync.service pve-cluster.service

etheriault · Mar 28, 2025

erwinvank said:
The cluster is OK again.

In short, it turned out that it was pve-ha-crm / pve-ha-lrm service related... I was unable to get a dir listing of /etc/pve/local and the service pmxcfs prevented me from restarting these services. It hung.

So, I copy/pasted the following commands in my terminal to execute them faster (example for LRM):

Code:

systemctl stop corosync.service pve-cluster.service ps ax | grep pmx | cut -d" " -f1 | xargs kill -9 service pve-ha-lrm restart systemctl start corosync.service pve-cluster.service

This also saved the situation for me after having mistakenly fulled the OS Drive with bulk migration. Thanks !

Search

Search

cluster issues: pvestatd and pvedaeamon timeout

erwinvank

New Member

erwinvank

New Member

etheriault

New Member

We value your privacy