I do this because related to
this incident
master node die, lrm is waiting for agent lock but master node was die so LRM can not get any activity from pve-ha-crm. So i decided to shutdown all node and reboot node19. After node19 booted, we run
pve e 1 to get /etc/pve writable. Then i modified corosync totem address from 10.10.30.0 (ring0) to 10.10.30.169 and 10.20.30.0 (ring1) to 10.20.30.169 with a hope HA cluster will make this node19 become master node.
I assume after boot remaining node, they will update to new config from node19 but they won't. They are running on separate cluster. node19 on 1 cluster and all remaining node on differences cluster.
Finally, i decided to shutdown node19 and remove all VM config with (ha-manager remove sid command) then start all VPS back.
All VPS from node19 can not start so i have to
MOVE all config from /etc/pve/nodes/node19/qemu-server/*.conf to /etc/pve/nodes/node15/qemu-server/ and start manually on node15
Can we add node19 back to cluster without reinstall them? probable VM config is still listed under this directory: /etc/pve/nodes/node19/qemu-server/
Back to the question of this thread, can we find out which server control (keep) pve-ha-crm on the incident date? we want to explore more logs on this server. node19 is shutdown now so maybe this node is control pve-ha-crm on that date and this node has more information for us.