got fencing and HA migration working, but theres some tricky parts. id like to make a playbook(ansible) to check for these. i noticed a couple things to check for.
in one case rgmanager was not running for some reason. restart from gui. this prevented HA migration to that node. the error says the node does not exist. whats cool is when i tested this with clusvcadm -r, the guest ended up on another node instead of failing the migration. dont know if that was a bug or feature but i like that behavior. seems more of a feature the more i think about that.
in another, a certain node was red when you logged into the web front end from the other two, or the other two were red when you logged in from that one. service pve-cluster restart fixed it. how do you check for that condition in the shell or from a script?
is there anything else to check for? how would this be done?
do failure domains make it more robust? when i finally got it working, it was without one. the docs i tracked down said there was already a default failure domain, which is why i havent tried it yet.
in one case rgmanager was not running for some reason. restart from gui. this prevented HA migration to that node. the error says the node does not exist. whats cool is when i tested this with clusvcadm -r, the guest ended up on another node instead of failing the migration. dont know if that was a bug or feature but i like that behavior. seems more of a feature the more i think about that.
in another, a certain node was red when you logged into the web front end from the other two, or the other two were red when you logged in from that one. service pve-cluster restart fixed it. how do you check for that condition in the shell or from a script?
is there anything else to check for? how would this be done?
do failure domains make it more robust? when i finally got it working, it was without one. the docs i tracked down said there was already a default failure domain, which is why i havent tried it yet.