Shutdown policy 'migrate' and cluster poweroff

FrankvdAa

Member
Aug 18, 2022
18
4
8
Netherlands
Hi,

I have a two-node cluster with ZFS replication and a Raspberry Pi as a qdevice for quorum.

All VM's and CT's are running on one node and storage is replicated to the other with ZFS. HA is configured for all CT/VM's.

For normal maintenance (updates/reboots) I have set the shutdown policy to 'migrate', so all CT/VM's get automatically migrated to the other node when rebooting the node they're running on. Once the rebooted node returns, all CT/VM's are migrated back.

This all works as intended, but when I power down both nodes, the one running the CT/VM's remains powered on, most likely because it cannot migrate the CT/VM's.

I'm planning on connecting a UPS (one for both nodes), but in case of a power failure both nodes will be shut down, most likely resulting in one node staying online due to the inability to migrate.

Is my assumption correct? What would be the best way to keep HA migrating the CT/VM's automatically, but shutdown all CT/VM's and power down the nodes in case of a power failure?
 
This may be helpful:



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Having recently struggled with writing a network UPS-triggered shutdown script for a HA cluster, here's the logic I ended up with:

Nut-server runs in a VM. This VM is HA-managed and can run on any host in the cluster.
Nut-client runs on the nut-server VM. It runs a shutdown script when UPS sends a low battery signal.
All scripted commands are issued using the HTTP API, normally targeted at the same host the Nut-server VM happens to be running on.

Order of operations is as follows:
Set HA shutdown policy to freeze to disable migrations.
Set the HA policy for the nut-server VM to "ignore" so it can go down with it's host.
Shut down all guests on other hosts.
Disable HA daemons on other hosts.
Shut down all other guests on the host that the nut-server VM is running on.
Disable HA daemons on the host that the nut-server VM is running on.
Shut down other hosts.
Shut down the host that the nut-server VM is running on.

Power-on is the same thing only in reverse order.
 
Reading from both your replies, it seems there isn't a out-of-the-box solution for this.

I'll look into the provided solutions and see what I can come-up with.

Thanks so far!
 
Hi FrankvdAa,

I would try to utilize

Code:
systemctl stop pve-ha-crm
systemctl stop pve-ha-lrm

like @bbgeek17 mentioned.

You should not disable services, that will prevent them to come back up, after the next reboot.
The same would apply to setting special HA states, like ignored or shutdown.

Of course you can reenable them manually afterwards, but I am pretty sure, that at least spoken for me I will not remember that down the road. :)

If you VMs do not shut down properly in time afterwards you can (hard) end their processes.

Code:
# To list the process ids of your guests: 
ps -ef | egrep "lxc-star[t]|kvm -i[d]" | awk '{print $2}'

# To kill the processes
for i in $(ps -ef | egrep "lxc-star[t]|kvm -i[d]" | awk '{print $2}'); do echo $i; kill $i ; done

# To hard kill the processes (as ultima ratio)
for i in $(ps -ef | egrep "lxc-star[t]|kvm -i[d]" | awk '{print $2}'); do echo $i; kill -9 $i ; done

BR, Lucas