Hi
There is already a older post https://forum.proxmox.com/threads/ha-shutdown-with-ups.21877/ about that, but the answer is not quite satisfying.
The problem is this: I did a testrun on a 3-host cluster and observed, that the Proxmox nodes do try to migate off HA VMs to one another which shuts down the same time. The process hangs and finally the watchdog resets the remaining server (This is what I think happens, since i cannot stay connected to a host and observe logs and all).
This results in: 2 Hosts shutdown as they should, one host hard reset and booting up. This is definitely not what a UPS shutdown should look like. Due to the nature of NUT (the software I'm using for UPS control) not all the servers get the red flag the same second, rather a couple of seconds out of sync.
There could be a solution: Adjust the shutdown command in NUT to stop any HA service first, so the host will neither try to load off or accept another HA migration. The question is: Which service can I shut down to safely disable HA for the host only without using HA groups or disabling HA for individual Guests (needs documentation and re-enabling afterwards) so the hosts can shutdown quietly and in case of power restore start up and re-enable HA without user interaction. I want it as simple as possible.
Or how about a integrated pve-shutdown --ups which shuts down all VMs on the host regardless of HA or not and then shuts down the system? (and pve-shutdown --maintenance which offloads all VMs first and shuts down then, output to log etc...?)
This is what keeps me from using HA at the moment, since I cannot assure that all will go well in a powerout scenario.
Keep up the good work, proxmox team!
There is already a older post https://forum.proxmox.com/threads/ha-shutdown-with-ups.21877/ about that, but the answer is not quite satisfying.
The problem is this: I did a testrun on a 3-host cluster and observed, that the Proxmox nodes do try to migate off HA VMs to one another which shuts down the same time. The process hangs and finally the watchdog resets the remaining server (This is what I think happens, since i cannot stay connected to a host and observe logs and all).
This results in: 2 Hosts shutdown as they should, one host hard reset and booting up. This is definitely not what a UPS shutdown should look like. Due to the nature of NUT (the software I'm using for UPS control) not all the servers get the red flag the same second, rather a couple of seconds out of sync.
There could be a solution: Adjust the shutdown command in NUT to stop any HA service first, so the host will neither try to load off or accept another HA migration. The question is: Which service can I shut down to safely disable HA for the host only without using HA groups or disabling HA for individual Guests (needs documentation and re-enabling afterwards) so the hosts can shutdown quietly and in case of power restore start up and re-enable HA without user interaction. I want it as simple as possible.
Or how about a integrated pve-shutdown --ups which shuts down all VMs on the host regardless of HA or not and then shuts down the system? (and pve-shutdown --maintenance which offloads all VMs first and shuts down then, output to log etc...?)
This is what keeps me from using HA at the moment, since I cannot assure that all will go well in a powerout scenario.
Keep up the good work, proxmox team!