Hi
I am writing this here in case it can help someone else or, worst case scenario, a future me.
I have a 5 node cluster (pve6) with two corosync rings to avoid losing nodes or the cluster due to network issues, so it came as a total surprise that one of the nodes suddenly appeared as greyed out in the web ui and all the containers and vms on it were unnamed and greyed out too, as if the node was ejected from the cluster or worse.
Checking carefully revealed that the backup NFS server has run out of space and backup was not progressing. It seems backup process is doing some weird things with the cluster/node services (maybe a bug for proxmox people?) .
Adding more extra space to the NFS server so backup can continue did not help, but killing the backup process (https://forum.proxmox.com/threads/proxmox-backup-wont-stop.23219/) and the restarting pvestatd worked and all the containers/vms and node come back to normal status
As a summary
Hope it helps!
I am writing this here in case it can help someone else or, worst case scenario, a future me.
I have a 5 node cluster (pve6) with two corosync rings to avoid losing nodes or the cluster due to network issues, so it came as a total surprise that one of the nodes suddenly appeared as greyed out in the web ui and all the containers and vms on it were unnamed and greyed out too, as if the node was ejected from the cluster or worse.
- A quick check shown that corosync was reporting quorum and all the nodes were there and voting, as expected
- The containers and servers were actually responsive and working and I can connect from one node to another via the web-ui as nothing happens
- Restarting the services as suggested here: https://forum.proxmox.com/threads/node-displays-as-offline-in-gui-only.56984/ was not working either
- Also trying to list the vms/containers with pct list was not producing any output after more than a minute or so.
Checking carefully revealed that the backup NFS server has run out of space and backup was not progressing. It seems backup process is doing some weird things with the cluster/node services (maybe a bug for proxmox people?) .
Adding more extra space to the NFS server so backup can continue did not help, but killing the backup process (https://forum.proxmox.com/threads/proxmox-backup-wont-stop.23219/) and the restarting pvestatd worked and all the containers/vms and node come back to normal status
As a summary
Code:
vzdump -stop
ps -aux | grep vzdump
kill -9 processId
service pvestatd restart
Hope it helps!