Hi forum,
since a couple of months I have an intermittend issue with my Proxmox single-node server.
Every once in a while, the login to the Proxmox GUI fails with the following error message:

Of course, password is fine, realm is fine, ... and also system is up to date:
As I could still login via SSH, I was able to do some further investigation.
After some research, I found that the issue can be resolved by restarting the pve-cluster service.
However, the messages which journalctl -f shows, indicate there might be some bug or misconfiguration present which I cannot identify and mitigate by my own:
I see that the first part of the restart (shutting down the potentially running service) seems to fail due to timeout which I guess shouldn't happen:
Underlying cause for the failing login seems to be an outdated auth key pair which is renewed immediately after service restart:
Last observation is that immediately after service restart, also backup jobs start which are not supposed to happen during this time of the day. Hence, I think, it might have to do with some interrupted/failed/... backup job also ?!
I would be happy if you could help me understand why what fails to find some permanent solution to restore login-capability.
Thank you and best regards,
Siebo
since a couple of months I have an intermittend issue with my Proxmox single-node server.
Every once in a while, the login to the Proxmox GUI fails with the following error message:

Of course, password is fine, realm is fine, ... and also system is up to date:
pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-7-pve)As I could still login via SSH, I was able to do some further investigation.
After some research, I found that the issue can be resolved by restarting the pve-cluster service.
However, the messages which journalctl -f shows, indicate there might be some bug or misconfiguration present which I cannot identify and mitigate by my own:
Code:
root@pmx:~# systemctl restart pve-cluster.service && journalctl -f
Jan 29 13:56:34 pmx systemd[1]: pve-cluster.service: Killing process 3105109 (cfs_loop) with signal SIGKILL.
Jan 29 13:56:34 pmx systemd[1]: pve-cluster.service: Main process exited, code=killed, status=9/KILL
Jan 29 13:56:34 pmx systemd[1]: pve-cluster.service: Failed with result 'timeout'.
Jan 29 13:56:34 pmx systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 29 13:56:34 pmx systemd[1]: pve-cluster.service: Consumed 4min 7.698s CPU time.
Jan 29 13:56:34 pmx systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jan 29 13:56:34 pmx pmxcfs[610665]: [main] notice: resolved node name 'pmx' to '192.168.253.2' for default node IP address
Jan 29 13:56:34 pmx pmxcfs[610665]: [main] notice: resolved node name 'pmx' to '192.168.253.2' for default node IP address
Jan 29 13:56:35 pmx systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 29 13:56:35 pmx systemd[1]: corosync.service - Corosync Cluster Engine was skipped because of an unmet condition check (ConditionPathExists=/etc/corosync/corosync.conf).
Jan 29 13:56:35 pmx pvestatd[2442]: status update time (14.695 seconds)
Jan 29 13:56:35 pmx pve-firewall[2440]: firewall update time (8.015 seconds)
Jan 29 13:56:35 pmx pvestatd[2442]: auth key pair too old, rotating..
Jan 29 13:56:57 pmx pvedaemon[2346307]: <root@pam> successful auth for user 'root@pam'
Jan 29 13:57:04 pmx pvescheduler[612014]: <root@pam> starting task UPID:pmx:000956AF:02FCE544:65B7A0A0:vzdump::root@pam:
Jan 29 13:57:04 pmx pvescheduler[612015]: INFO: starting new backup job: vzdump --all 1 --mailto *** --compress zstd --storage backup --quiet 1 --mailnotification failure --mode snapshot --prune-backups 'keep-last=5'
Jan 29 13:57:04 pmx pvescheduler[612015]: INFO: Starting Backup of VM 101 (qemu)
Jan 29 13:57:37 pmx pvescheduler[612015]: INFO: Finished Backup of VM 101 (00:00:33)
Jan 29 13:57:37 pmx pvescheduler[612015]: INFO: Starting Backup of VM 102 (lxc)
[...]
I see that the first part of the restart (shutting down the potentially running service) seems to fail due to timeout which I guess shouldn't happen:
systemd[1]: pve-cluster.service: Failed with result 'timeout'.Underlying cause for the failing login seems to be an outdated auth key pair which is renewed immediately after service restart:
pvestatd[2442]: auth key pair too old, rotating..Last observation is that immediately after service restart, also backup jobs start which are not supposed to happen during this time of the day. Hence, I think, it might have to do with some interrupted/failed/... backup job also ?!
pvescheduler[612014]: <root@pam> starting task UPID:pmx:000956AF:02FCE544:65B7A0A0:vzdump::root@pam:I would be happy if you could help me understand why what fails to find some permanent solution to restore login-capability.
Thank you and best regards,
Siebo