Today I had an interesting failure. I have a node that was misbehaving, and corosync was not able to synchronize (was getting [TOTEM ] Received message has invalid digest... ignoring.)
I proceeded to move containers off of it but part way through the process the node crashed. Here is the first problem: it would not get fenced. I reboot it, shut it down, and eventually kicked it out of the cluster- and the fencing mechanism would not stop trying to kick it out. I am still receiving the proxmox -FENCE and proxmox-SUCCEEDED emails every minute, its driving me crazy.
There were a number of containers that were "owned" by the dead node; I manually restored them from backup and changed their node assignment but HA will not start them if I attempt to add them to HA- they would only run if they are removed from ha- but then they remain in ignored state. I have since moved them off to another cluster and they are STILL listed when I issue ha-manager status with state "ignored." the ctids DO NOT EXIST anymore and still they remain in ignored state.
1. how do I make the dead node really removed from the cluster? it no longer shows in pvecm --status, nor in /etc/pve/.members. I'm still getting the fencing emails every minute
2. how do I remove the containers that show in ha-manager status, even if they no longer exist on the cluster?
I proceeded to move containers off of it but part way through the process the node crashed. Here is the first problem: it would not get fenced. I reboot it, shut it down, and eventually kicked it out of the cluster- and the fencing mechanism would not stop trying to kick it out. I am still receiving the proxmox -FENCE and proxmox-SUCCEEDED emails every minute, its driving me crazy.
There were a number of containers that were "owned" by the dead node; I manually restored them from backup and changed their node assignment but HA will not start them if I attempt to add them to HA- they would only run if they are removed from ha- but then they remain in ignored state. I have since moved them off to another cluster and they are STILL listed when I issue ha-manager status with state "ignored." the ctids DO NOT EXIST anymore and still they remain in ignored state.
1. how do I make the dead node really removed from the cluster? it no longer shows in pvecm --status, nor in /etc/pve/.members. I'm still getting the fencing emails every minute
2. how do I remove the containers that show in ha-manager status, even if they no longer exist on the cluster?