migrate VMs in cluster on planned reboot only

freebeerz

Renowned Member
Aug 5, 2015
17
2
68
Hi,

I want to use HA safely in a proxmox cluster with no shared storage (only zfs replication). The behaviour I'd like for some VMs is this:

- for planned node reboots (`reboot` command on shell or the UI reboot button): automatically live migrate the VMs to another node before node reboot
- for unexpected node crashes/network issue: freeze the VM on the failed node (do not reschedule on another running node to avoid using old replicated storage)

What are the HA settings to achieve this?

Thank you!
 
Just shot from the air maybe that will work as you asked for if you do NOT define any virtual machine ha settings ... did you try this with turning on/off node maintenance mode ? And start on boot must be off because how should pve know if it crashed before and rebootet or just was switched on ...
 
Last edited:
Just shot from the air maybe that will work as you asked for if you do NOT define any virtual machine ha settings ... did you try this with turning on/off node maintenance mode ? And start on boot must be off because how should pve know if it crashed before and rebootet or just was switched on ...
It doesn't seem to be the case: if I reboot the node without any VMs in an HA group they just get shutdown automatically.

If I put them in an HA group and reboot the node from proxmox, they gracefully migrate to another node (with a zfs replication) without downtime, but if I yank the network they always restart on another node using whatever previous zfs replicated storage is present (or they crash if there was no replication)

Maybe it's not possible to only have automatic HA for planned maintenance but not otherwise? It would make a lot of sense for non-shared storage VMs when you care about data consistency (most of the time unless you run some stateless VMs!)

Well I can still just migrate them manually from node to node when I do node updates but it's a bit of a pain :)
 
It doesn't seem to be the case: if I reboot the node without any VMs in an HA group they just get shutdown automatically.
Did you set "mantenance mode" on pve host first and wait if somethink begin to migrate ?!
 
but if I yank the network they always restart on another node using whatever previous zfs replicated storage is present (or they crash if there was no replication)
PVE does not have functionality similar to VMware Fault Tolerance. When the network or node is "yanked", there is no way to migrate the memory state of the VM - it does not exist. The VM has to be started from scratch. And, since there was no graceful replication update - it has to start from last known good timepoint.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
PVE does not have functionality similar to VMware Fault Tolerance. When the network or node is "yanked", there is no way to migrate the memory state of the VM - it does not exist. The VM has to be started from scratch. And, since there was no graceful replication update - it has to start from last known good timepoint.
Yes and that's perfectly fine with what I want to achieve: only live migrate on planned maintenance, and otherwise do not restart the VMs on another node if a node fails (and ideally if that failed node is still running but just can't connect to the cluster it should leave its VMs running, but I would be ok too with a freeze/shutdown behaviour)

So I changed the cluster HA setting to "migrate" so that it moves VMs automatically on planned reboots but with this setting they also restart on another node in node failure scenarios, which would be ok if I had shared storage but I'm using ZFS replication and I don't want to auto restart a VM on a node without synced storage (which of course can only happen on planned node shutdowns)

So for my update maintenance maybe instead I should disable HA for the VMs and do this with ansible:

for each node:
- migrate running VMs to another node
- run apt update/upgrade
- reboot
- migrate the VMs back
 
Last edited:
Did you set "mantenance mode" on pve host first and wait if somethink begin to migrate ?!
I tried that and nothing automatically migrates unless the VMs are in an HA group, but in that case they also restart automatically on another node if a node fails (which is what I want to avoid because lack of shared storage)

My cluster HA setting is "migrate", would be nice to have something like "migrate-only-on-maintenance"