How to achieve both HA and Autostart combined with proper shutdown on power outage?

Wazaari

New Member
Dec 28, 2023
10
2
3
Hi all,

our setup is a three node Proxmox Cluster with Ceph Storage. So far, everything works fine. We're organised VMs in three pools (Tier2, least important, Tier1, medium important, Tier0, critical). Autostart is configured for all of them, HA is configured for the Tier0 VMs only to make sure they survive a host failure.

In addition, the hosts are connected to a UPS system mainly to ensure proper shutdown in case of power outage. The UPS system is monitored, and the following things happen:

- At power loss (battery event), all Tier2 VMs are shut down
- At 60% capacity, all Tier1 VMs are shut down, then the third node is shut down as well
- At 20% capacity, all Tier0 VMs are shut down, then the two remaining nodes are shut down

This works fine, but after restarting everything, the Tier0 VMs are not started automatically. My assumption is that this has something to do with the HA state of these VMs, as the HA state first goes into `request_stop` and then to `stopped` during the shutdown event.

So the question is: how can I properly shut down HA protected VMs during the power outage, while still making sure they get started again once power returns?

Thanks!
Daniel
 
Thanks for your answer, I'm not sure I'm understanding it though. Can you elaborate why that would make a difference and how it would affect the startup order?
 
Hi, sure. As startup is handled on a per host basis, I'll give them as per-host as well:

Proxmox Node 01
VM NamePoolOrderStartup Delay
firewallTier010N/A
opvpn-raTier010N/A
k8s-node2Tier12030s
packetfenceTier13030s

Proxmox Node 02
VM NamePoolOrderStartup Delay
dnsTier010N/A
dc1Tier010N/A
k8s-node2Tier12030s
homeassistantTier23030s
smbv1_proxyTier24030s

Proxmox Node 03
VM NamePoolOrderStartup Delay
k8s-node3Tier12030s
cupsTier23030s
ima_winxpTier24030s

Node03 doesn't contain any Tier0 VMs, that's also why we shut it down in case of power outage. After restore, all VMs come back up, except for the three Tier0 VMs. Those happen to be the HA enabled ones, too, so I suspect there's some correlation as described initially.

Note that I also have a generic 30s startup delay configured on all hosts, to make sure the Ceph FS is ready before VMs are started,
 
Last edited:
I'm sorry, what do you mean? I start all three nodes in parallel (at the same time), I'm not doing any manual actions on the VMs.
 
From the documentation:

VMs managed by the HA stack do not follow the start on boot and boot order options currently. Those VMs will be skipped by the startup and shutdown algorithm as the HA manager itself ensures that VMs get started and stopped.

https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_bootorder

You could skip the step where you shutdown the Tier 0 guests and just shutdown the nodes, which will automatically stop all the guests. They will restart when the node starts.
 
Thanks @LordRatner, somehow managed to miss this piece of documentation. Guess thats the only way then. I was worried about timing effects, as the two hosts would also be the last two CEPH members. If one of the hosts goes down before all guests shut down from the other host, the guests wouldn't be able to write to disk anymore.
 
  • Like
Reactions: LordRatner

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!