Best Practice for shutting down my cluster

dixie2000

Member
May 16, 2023
76
3
8
Hello all,

I have a 2 node cluster and a raspberry pi as a Qdevice with VMs and CT's configured as HA.

In the event of a power outage my UPS will run a script (after 10 minutes on battery) to shutdown the 2 nodes and the Pi.

In order to prevent any issues during the shutdown the script connects to my primary nodes and runs this script which disables HA for all VM and CT:

Code:
ha-manager status | \
    grep started | \
    awk '{print $2}' | \
    xargs -n 1 ha-manager set --state disabled

After 30 seconds the UPS script will ssh into each node and issue a shutdown.

Is this adequate to prevent any issues or shuffling of guests during the shutdown? Should there be any issues when starting the nodes again? I would issue the same script above but with "enable".

I also read about doing this before shutting down the nodes:
  1. systemctl stop pve-ha-lrm
  2. systemctl stop pve-ha-crm

Is the latter a "better" approach?

Hope this makes sense and I welcome your thoughts!
 
Anyone have any suggestions? My basic goal is to be able to shutdown my cluster (2 nodes) with HA and retain is current state so that it can be brought back up at a later time in the same state it was when powered off. The shutdown needs to be able to happen unattended basically by my UPS initiating a shutdown when the power goes off and I am not around to step in I need this to happen.

There must be others out there with the same situation.

Thanks!
 
Have you tested with just shutting down both nodes the same time? I would expect the VMs to shutdown cleanly but stay on the node they're on - as there is no alternative available to migrate to in this situation.

See also "Datacenter --> Options --> HA Settings" and the respective documentation.
 
@UdoB

I have not attempted your suggestion. In any case, I need the process to be handled without my intervention. When power goes out my UPS will run a script shutting down the nodes. But, it needs to somehow ensure that HA is disabled etc so one node is not trying to migrate to the other.

Basically:
  1. Power goes out
  2. UPS script disables HA (and whatever else is needed)
  3. UPS script then powers off the nodes
  4. Once power is restored I can then restart the nodes hopefully in the state they were in before the outage

Hope that makes sense.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!