Problem with ha-manager

alexskysilk

Distinguished Member
Oct 16, 2015
1,491
257
153
Chatsworth, CA
www.skysilk.com
Today I had an interesting failure. I have a node that was misbehaving, and corosync was not able to synchronize (was getting [TOTEM ] Received message has invalid digest... ignoring.)

I proceeded to move containers off of it but part way through the process the node crashed. Here is the first problem: it would not get fenced. I reboot it, shut it down, and eventually kicked it out of the cluster- and the fencing mechanism would not stop trying to kick it out. I am still receiving the proxmox -FENCE and proxmox-SUCCEEDED emails every minute, its driving me crazy.

There were a number of containers that were "owned" by the dead node; I manually restored them from backup and changed their node assignment but HA will not start them if I attempt to add them to HA- they would only run if they are removed from ha- but then they remain in ignored state. I have since moved them off to another cluster and they are STILL listed when I issue ha-manager status with state "ignored." the ctids DO NOT EXIST anymore and still they remain in ignored state.

1. how do I make the dead node really removed from the cluster? it no longer shows in pvecm --status, nor in /etc/pve/.members. I'm still getting the fencing emails every minute
2. how do I remove the containers that show in ha-manager status, even if they no longer exist on the cluster?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!