Hi forum,
since a couple of months I have an intermittend issue with my Proxmox single-node server.
Every once in a while, the login to the Proxmox GUI fails with the following error message:
Of course, password is fine, realm is fine, ... and also system is up to date:
As I could still login via SSH, I was able to do some further investigation.
After some research, I found that the issue can be resolved by restarting the pve-cluster service.
However, the messages which journalctl -f shows, indicate there might be some bug or misconfiguration present which I cannot identify and mitigate by my own:
I see that the first part of the restart (shutting down the potentially running service) seems to fail due to timeout which I guess shouldn't happen:
Underlying cause for the failing login seems to be an outdated auth key pair which is renewed immediately after service restart:
Last observation is that immediately after service restart, also backup jobs start which are not supposed to happen during this time of the day. Hence, I think, it might have to do with some interrupted/failed/... backup job also ?!
I would be happy if you could help me understand why what fails to find some permanent solution to restore login-capability.
Thank you and best regards,
Siebo
since a couple of months I have an intermittend issue with my Proxmox single-node server.
Every once in a while, the login to the Proxmox GUI fails with the following error message:
Of course, password is fine, realm is fine, ... and also system is up to date:
pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-7-pve)
As I could still login via SSH, I was able to do some further investigation.
After some research, I found that the issue can be resolved by restarting the pve-cluster service.
However, the messages which journalctl -f shows, indicate there might be some bug or misconfiguration present which I cannot identify and mitigate by my own:
Code:
root@pmx:~# systemctl restart pve-cluster.service && journalctl -f
Jan 29 13:56:34 pmx systemd[1]: pve-cluster.service: Killing process 3105109 (cfs_loop) with signal SIGKILL.
Jan 29 13:56:34 pmx systemd[1]: pve-cluster.service: Main process exited, code=killed, status=9/KILL
Jan 29 13:56:34 pmx systemd[1]: pve-cluster.service: Failed with result 'timeout'.
Jan 29 13:56:34 pmx systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 29 13:56:34 pmx systemd[1]: pve-cluster.service: Consumed 4min 7.698s CPU time.
Jan 29 13:56:34 pmx systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jan 29 13:56:34 pmx pmxcfs[610665]: [main] notice: resolved node name 'pmx' to '192.168.253.2' for default node IP address
Jan 29 13:56:34 pmx pmxcfs[610665]: [main] notice: resolved node name 'pmx' to '192.168.253.2' for default node IP address
Jan 29 13:56:35 pmx systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
Jan 29 13:56:35 pmx systemd[1]: corosync.service - Corosync Cluster Engine was skipped because of an unmet condition check (ConditionPathExists=/etc/corosync/corosync.conf).
Jan 29 13:56:35 pmx pvestatd[2442]: status update time (14.695 seconds)
Jan 29 13:56:35 pmx pve-firewall[2440]: firewall update time (8.015 seconds)
Jan 29 13:56:35 pmx pvestatd[2442]: auth key pair too old, rotating..
Jan 29 13:56:57 pmx pvedaemon[2346307]: <root@pam> successful auth for user 'root@pam'
Jan 29 13:57:04 pmx pvescheduler[612014]: <root@pam> starting task UPID:pmx:000956AF:02FCE544:65B7A0A0:vzdump::root@pam:
Jan 29 13:57:04 pmx pvescheduler[612015]: INFO: starting new backup job: vzdump --all 1 --mailto *** --compress zstd --storage backup --quiet 1 --mailnotification failure --mode snapshot --prune-backups 'keep-last=5'
Jan 29 13:57:04 pmx pvescheduler[612015]: INFO: Starting Backup of VM 101 (qemu)
Jan 29 13:57:37 pmx pvescheduler[612015]: INFO: Finished Backup of VM 101 (00:00:33)
Jan 29 13:57:37 pmx pvescheduler[612015]: INFO: Starting Backup of VM 102 (lxc)
[...]
I see that the first part of the restart (shutting down the potentially running service) seems to fail due to timeout which I guess shouldn't happen:
systemd[1]: pve-cluster.service: Failed with result 'timeout'.
Underlying cause for the failing login seems to be an outdated auth key pair which is renewed immediately after service restart:
pvestatd[2442]: auth key pair too old, rotating..
Last observation is that immediately after service restart, also backup jobs start which are not supposed to happen during this time of the day. Hence, I think, it might have to do with some interrupted/failed/... backup job also ?!
pvescheduler[612014]: <root@pam> starting task UPID:pmx:000956AF:02FCE544:65B7A0A0:vzdump::root@pam:
I would be happy if you could help me understand why what fails to find some permanent solution to restore login-capability.
Thank you and best regards,
Siebo