Hi All,
So we have a 5 node pve cluster and lately, the first node becomes unreachable in the proxmox web ui ~ 1x day. See screenshot:
About 1x per day, the node becomes "unreachable", as seen in the screenshot, and is not reachable via the web ui from any other node. It's own web ui seems to load, but does not allow me to login. When attempting to login, it will spin and load for a bit and then come back with an error saying, "Connection failure. Network error or Proxmox VE services not running?".
A reboot fixes this, but then it always happens again the next day at some seemingly random point in time.
I can still ssh to it, the box is up and running, and the VMs are also running.
It just seems like whatever pve service is responsible for notifying the others is dead.
I've checked the systemd service for:
Anything else I can do to troubleshoot this?
Thanks in advance!
EDIT: So I've narrowed it down to `pveproxy`. This doesn't seem to be up although its running mannyyy copies of the "binary" (perl script).
See systemctl status pveproxy in the screenshot below. This is after having tried to sigkill those processes and restart the systemd service a few times. None of the processes let you kill them, even with a kill -9. Not sure what's going on here.. Note that the .pid file in /var/run/pveproxy/pveproxy.pid is NOT there at this point in time..
So we have a 5 node pve cluster and lately, the first node becomes unreachable in the proxmox web ui ~ 1x day. See screenshot:
About 1x per day, the node becomes "unreachable", as seen in the screenshot, and is not reachable via the web ui from any other node. It's own web ui seems to load, but does not allow me to login. When attempting to login, it will spin and load for a bit and then come back with an error saying, "Connection failure. Network error or Proxmox VE services not running?".
A reboot fixes this, but then it always happens again the next day at some seemingly random point in time.
I can still ssh to it, the box is up and running, and the VMs are also running.
It just seems like whatever pve service is responsible for notifying the others is dead.
I've checked the systemd service for:
- pveproxy
- pve-manager
- pvedaemon
Anything else I can do to troubleshoot this?
Thanks in advance!
EDIT: So I've narrowed it down to `pveproxy`. This doesn't seem to be up although its running mannyyy copies of the "binary" (perl script).
See systemctl status pveproxy in the screenshot below. This is after having tried to sigkill those processes and restart the systemd service a few times. None of the processes let you kill them, even with a kill -9. Not sure what's going on here.. Note that the .pid file in /var/run/pveproxy/pveproxy.pid is NOT there at this point in time..
Last edited: