We have nut-server successfully monitoring a UPS with nut-client running on all nodes.
When power goes away it correctly and simultaneously initiates 'init 0' on all nodes but this then causes problems.
Nodes that only provide Ceph storage shut down before VMs are given a chance (yes, qemu agent everywhere) and HA tries to migrate VMs around the cluster but then puts them in an error state as storage requests start timing out as Ceph placement groups become inactive.
We want HA, we want migrate on shutdown (rolling upgrades) and we want fallback for when uograded nodes reboot. Is there a technique to once-off shutdown all VMs and then shutdown nodes so that we don't have to clear HA error states on guests when it all starts up again?
When power goes away it correctly and simultaneously initiates 'init 0' on all nodes but this then causes problems.
Nodes that only provide Ceph storage shut down before VMs are given a chance (yes, qemu agent everywhere) and HA tries to migrate VMs around the cluster but then puts them in an error state as storage requests start timing out as Ceph placement groups become inactive.
We want HA, we want migrate on shutdown (rolling upgrades) and we want fallback for when uograded nodes reboot. Is there a technique to once-off shutdown all VMs and then shutdown nodes so that we don't have to clear HA error states on guests when it all starts up again?