Install NUT directly on Proxmox VE and control guests from here

Hi @philkunz , got a few questions on your shutdown tool.
We have 2 nodes with 2 UPSes (SNMP enabled), so I can see you can add multiple UPS to a single node.
1) how does it handle multiple UPSes? I can see each UPS device looks to have a threshold, but then theres a group you can add UPSes to with its own thresholds. If one UPS goes on battery and the other is on power, what happens? Similarly if both UPSes go into battery mode, is thresholds based on whichever is lowest or aggregate or higher? Is this based on if they are in a group or added as standalone?
2) How does it handle Proxmox clusters? In our ESXI environment when the shutdown event occurred sometimes it would try to migrate the VMs to the other host, even though both are being asked to shutdown. If I setup HA on my VM's does your tool prevent HA migration and try to clean shutdown the VM's as quick as possible and prevent HA movement?
Regards
Damien
 
Last edited:
Hi Damien,
As of v5.7.0, this is how it works:

1) Multiple UPSes / groups​

You can attach actions either directly to individual UPS devices, or to a group.
  • UPS-level actions are evaluated per UPS, independently.
  • Group-level actions are evaluated across all members after a full poll cycle.
The important bit is: group thresholds are not aggregated. There is no lowest/highest/average battery calculation across the group.
Instead, each group action evaluates each member UPS against that action’s own thresholds:
  • redundant
    • power-change logic treats the group as “on battery” only when all member UPSes are on battery
    • threshold-based actions fire only when all member UPSes are on battery and below that action’s thresholds
  • nonRedundant
    • power-change logic treats the group as “on battery” when any member UPS is on battery
    • threshold-based actions fire when any member UPS is on battery and below that action’s thresholds
So for your examples:
  • If one UPS is on battery and the other is still on mains:
    • direct UPS actions on that affected UPS can still trigger based on that UPS alone
    • a redundant group action will not trigger yet
    • a nonRedundant group action can trigger once that affected UPS crosses the configured threshold
  • If both UPSes are on battery:
    • redundant waits until both are below the group action threshold
    • nonRedundant fires as soon as either one is below the group action threshold
So the behavior depends on where you attach the action:
  • on the UPS = standalone/per-UPS logic
  • on the group = group mode logic
For a dual-UPS redundant host, I would usually recommend:
  • put alerting actions on the individual UPSes if you want
  • put the actual Proxmox shutdown + host shutdown on the group in redundant mode
One extra safety behavior: destructive group actions (proxmox, shutdown) are suppressed if a required group member is unreachable, so it does not make a shutdown decision on partial data.

2) Proxmox clusters / HA​

NUPST is still node-local, not a cluster-wide coordinator.
What it does:
  • shuts down the VMs and LXCs on the local node
  • waits for graceful shutdown
  • force-stops remaining guests if configured
  • then the host shutdown action can run after that
What changed for HA-managed guests:
  • there is now an HA-aware mode: proxmoxHaPolicy: "haStop"
  • in that mode, HA-managed guests are told to go to stopped through the Proxmox HA layer instead of only sending plain qm/pct shutdown
That is the correct path if you want HA-managed guests to stop cleanly without HA treating them as failed and trying to move/restart them elsewhere.
What it still does not do:
  • no cluster-wide shutdown orchestration
  • no global HA disable
  • no coordination between hosts beyond “each host handles itself”
So for a 2-node cluster, the intended setup is:
  • run NUPST on both nodes
  • use proxmoxHaPolicy: "haStop" on the Proxmox action
  • place the Proxmox action before the host shutdown action
That way each node stops its own HA/non-HA guests properly, then shuts itself down, rather than trying to evacuate workloads during a power event.

If you think parts of this should be solved differently, feel free to start a discussion about how things should be and why. We want to make nupst better for everyone.

Regards,
Phil
 
  • Like
Reactions: damiengm
Hi @philkunz
Thanks for the very detailed answer, this helps alot in how I set the thresholds up. I was on 5.6, so I upgraded to get the HaPolicy stuff. It doesn't look like there is an edit action command on the CLI so I edited the config file and restarted the service. So a suggestion would be an action edit command? (like editing the groups & ups worked great). Apart from that the changes you have quickly added (and the tool in general) means I can use this instead of figuring out NUTs.

As for how you are handling the HA/cluster environment, It think it works fine for a small environment like ours (I can have two terminals open and repeat commands as necessary). I just need it to shutdown cleanly and quickly as possible. For large clusters I cannot provide any guidance but I'm sure they would like something sitting in the Datacenter view to control it all/automate etc. Is it feasible to copy the config file to multiple nodes, (ie the ids don't cause issues if they are duplicate? It doesn't seem to me as nupst doesn't do host to host comms afaict)

BTW, our APC Smart-UPS 3000 using the Network Management Card via SNMPv1 reports runtime in ticks, not the more common minutes as suggested by your wizard. I got surprised I had 1000's of minutes left on my UPS, did I just download more battery :)? This of course may be a one off, I don't know, but it was easy to change the setup of the UPS via the CLI.

Now I just have to find downtime to test it all. >_<

Regards
Damien