Shut down a VM when a different hosts shuts down

mort47

New Member
Sep 12, 2024
8
0
1
Hi,

I have a three node cluster and I'm basically at my memory limit. The nodes are chinstrap, magellanic, and humboldt. On humboldt there's a VM that uses up quite a lot of memory but isn't important. What I'd like is to automatically shut down that VM when either chinstrap or magellanic begin to go offline so that there is enough memory free for the HA migrations of the important VM's. Then, once all the hosts are running and all the VM's are back where they normally live, the VM can start again.

How would I go about doing that? Is there a script that Proxmox executes as a host goes into maintenance mode or anything like that?

Thanks.
 
Hello,

For example, you could set in both chinstrap and magellanic sepcific orders in /etc/network/interfaces, underneath vmbr0 :

pre-down ha-manager set [VMID] --state disabled

Then, when those hosts go down and set vmbr0 down, they would previously execute said order and shut down (even disable) your VM, provided it is a HA resource.

Then again, you may would want to make sure that VM is up and running when those hosts also are up. That would be feasable with a second command :

post-up ha-manager set [VMID] --state started.


Another way to do this could be through configuring hook-scripts. For those, check /usr/share/pve-docs/examples/guest-example-hookscript.pl on any Proxmox VE installation.

Kind regards,

GD
 
  • Like
Reactions: waltar
For example, you could set in both chinstrap and magellanic sepcific orders in /etc/network/interfaces, underneath vmbr0 :

pre-down ha-manager set [VMID] --state disabled

Then, when those hosts go down and set vmbr0 down, they would previously execute said order and shut down (even disable) your VM, provided it is a HA resource.
ha-manager set [VMID] --state disabled will set the HA resource state of that VM to disabled.
It will NOT shutdown that running VM on the node humboldt of the OP.
So this does not provide the OP with his desired outcome.


Another way to do this could be through configuring hook-scripts.
I don't see how hookscripts could be used during the continuous running duration of the VM.

AFAIK hookscripts have 4 phases: pre-start, post-start, pre-stop & post-stop which are called ONCE during these stages.


I believe for the OP to achieve his desired result, he will require his own "script shenanigans" which will be rather difficult to achieve, given that this is cluster-state-dependent.
 
Last edited:
  • Like
Reactions: waltar
A simple cron job on the VM you want to be killed, run every minute, or you can run it continuously in a systemd process with whatever timeout you find reasonable.

Other ways would be to tie it into Prometheus or other monitoring tools. Or as others say, prehook scripts will run to start the VM (it will also run during migration, we use it to provision NVIDIA GPU)

HOST="example.com"
COUNT=3

if ! ping -c $COUNT $HOST > /dev/null 2>&1; then
echo "$(date): $HOST is unreachable. Shutting down." >> /var/log/host_check.log
/sbin/shutdown -h now
fi
 
Last edited:
Then, once all the hosts are running and all the VM's are back where they normally live, the VM can start again.
This doesn't feel logical IMHO. If you will end up starting the VM again in any of your surviving hosts, it will eventually use the same amount of memory it had before the shutdown, risking OOM killer on the host it is running in. Maybe a simpler option could be to use memory ballooning for that VM or simply lower the memory assigned to the VM.

you could set in both chinstrap and magellanic sepcific orders in /etc/network/interfaces
That's a nice trick, but remember that if HA fences "humboldt" because it loses quorum or if "humboldt" crashes it will be an immediate restart/hugh/poweroff and it will not execute that pre-down.

Adding another idea: you may create a dummy VM, set it in HA and with a start hook script tell PVE to stop that memory hungry VM or even decide to do something else depending on other parameters.
 
This doesn't feel logical IMHO. If you will end up starting the VM again in any of your surviving hosts, it will eventually use the same amount of memory it had before the shutdown, risking OOM killer on the host it is running in. Maybe a simpler option could be to use memory ballooning for that VM or simply lower the memory assigned to the VM.

I'm not an expert in memory management but I can imagine that if the hypervisor attempts to reclaim memory that the VM says is "in use" then even if it is able to do that it will not end well for the processes inside the VM and I would much rather have it simply shut down gracefully in the event of a failover. I could also use less memory for sure but this VM does need quite a lot of it which is fine and only becomes too much if I have to shut down chinstrap or magellanic. For that reason it'd be really nice to have an HA option that says "any time anything at all happens, shut down these non-urgent VM's to free memory". From the responses to this thread, it looks like that currently isn't possible.

The problem with relying on ping or ifup/down scripts is that the HA migrations happen before the network devices go down.

Does HA manager log anything anywhere humboldt could see it when one of the other nodes goes offline (or is about to)? Maybe I could have it (or the VM) follow a log file and look for any VM migration event and respond to that.

Thanks everyone for your suggestions.
 
Last edited: