Best Practice for shutting down my cluster

dixie2000 · Apr 30, 2024

Hello all,

I have a 2 node cluster and a raspberry pi as a Qdevice with VMs and CT's configured as HA.

In the event of a power outage my UPS will run a script (after 10 minutes on battery) to shutdown the 2 nodes and the Pi.

In order to prevent any issues during the shutdown the script connects to my primary nodes and runs this script which disables HA for all VM and CT:

Code:

ha-manager status | \
    grep started | \
    awk '{print $2}' | \
    xargs -n 1 ha-manager set --state disabled

After 30 seconds the UPS script will ssh into each node and issue a shutdown.

Is this adequate to prevent any issues or shuffling of guests during the shutdown? Should there be any issues when starting the nodes again? I would issue the same script above but with "enable".

I also read about doing this before shutting down the nodes:

systemctl stop pve-ha-lrm
systemctl stop pve-ha-crm

Is the latter a "better" approach?

Hope this makes sense and I welcome your thoughts!

dixie2000 · Apr 30, 2024

Anyone have any suggestions? My basic goal is to be able to shutdown my cluster (2 nodes) with HA and retain is current state so that it can be brought back up at a later time in the same state it was when powered off. The shutdown needs to be able to happen unattended basically by my UPS initiating a shutdown when the power goes off and I am not around to step in I need this to happen.

There must be others out there with the same situation.

Thanks!

UdoB · Apr 30, 2024

Have you tested with just shutting down both nodes the same time? I would expect the VMs to shutdown cleanly but stay on the node they're on - as there is no alternative available to migrate to in this situation.

See also "Datacenter --> Options --> HA Settings" and the respective documentation.

dixie2000 · Apr 30, 2024

@UdoB

I have not attempted your suggestion. In any case, I need the process to be handled without my intervention. When power goes out my UPS will run a script shutting down the nodes. But, it needs to somehow ensure that HA is disabled etc so one node is not trying to migrate to the other.

Basically:

Power goes out
UPS script disables HA (and whatever else is needed)
UPS script then powers off the nodes
Once power is restored I can then restart the nodes hopefully in the state they were in before the outage

Hope that makes sense.

tomachi · Jul 26, 2024

dixie2000 said:
Hello all,

I have a 2 node cluster and a raspberry pi as a Qdevice with VMs and CT's configured as HA.

In the event of a power outage my UPS will run a script (after 10 minutes on battery) to shutdown the 2 nodes and the Pi.

In order to prevent any issues during the shutdown the script connects to my primary nodes and runs this script which disables HA for all VM and CT:

Code:

ha-manager status | \ grep started | \ awk '{print $2}' | \ xargs -n 1 ha-manager set --state disabled

After 30 seconds the UPS script will ssh into each node and issue a shutdown.

Is this adequate to prevent any issues or shuffling of guests during the shutdown? Should there be any issues when starting the nodes again? I would issue the same script above but with "enable".

I also read about doing this before shutting down the nodes:

systemctl stop pve-ha-lrm

systemctl stop pve-ha-crm

Is the latter a "better" approach?

Hope this makes sense and I welcome your thoughts!

That is a good question and I would love to know the answer.
My current best practice is to run

Code:

pvecm expected 1

and hope for the best.

But after reading this, I should make a script like yours maybe with my line at the ending? I would also like to know how bulk shutdown all guests from the CLI.

esi_y · Jul 26, 2024

tomachi said:
That is a good question and I would love to know the answer.
My current best practice is to run

Code:

pvecm expected 1

and hope for the best.

But after reading this, I should make a script like yours maybe with my line at the ending? I would also like to know how bulk shutdown all guests from the CLI.

Are you sure yours is the same issue? As setting quorum like this on e.g. HA setup for the ... purpose of a shutdown ... is just increasing risks of things going wrong but doing nothing for you in terms of the shutdown and you should not need it at all.

dixie2000 · Jul 26, 2024

tomachi said:
That is a good question and I would love to know the answer.
My current best practice is to run

Code:

pvecm expected 1

and hope for the best.

But after reading this, I should make a script like yours maybe with my line at the ending? I would also like to know how bulk shutdown all guests from the CLI.

This may be of help for your "bulk shutdown"
https://gist.github.com/davidwah/2a755fc97bfbfe141ea5f1744876e558

Search

Search

Best Practice for shutting down my cluster

dixie2000

Member

dixie2000

Member

UdoB

Distinguished Member

dixie2000

Member

tomachi

New Member

esi_y

Renowned Member

dixie2000

Member

We value your privacy