node restart and automatically migrate VM's (HA)

Gerhard W. Recher

Well-Known Member
Mar 10, 2017
158
8
58
Munich
Hi Folks,

I for any reason a Node will perform a (controlled) restart... no one of running VM's are migrated to best effort free node on cluster, even VM is in a HA group !

I think this is not the behavior intended by HA .

am I wrong or may I have missed some vital information's ?

regards

Gerhard
 
hi,

this is by design.

on a controlled reboot, we do not touch the ha resource until the host comes up.
we plan to make this configurable in the future

for now, if you want to move the ha vms before a reboot, use the bulk migrate option, or change the node priorities in the ha group temporarily
 
Dominik,

but this behavior is not really fail-safe PEBCAK .... if operator does not obey these manual steps, VM's might be corrupted at least, there is a downtime.( not really expected for a HA cluster)

to be honest, I think "by design" should be a controlled behavior, migrating active resources and then initiate host reboot.

I just tested this stetting my all group to different priority for each cluster node ... and ... whisper reboot without migration, Guest had downtime !
Code:
nodes: <node>[:<pri>]{,<node>[:<pri>]}*

List of cluster node members, where a priority can be given to each node. A resource bound to a group will run on the available nodes with the highest priority. If there are more nodes in the highest priority class, the services will get distributed to those nodes. The priorities have a relative meaning only.
 
but this behavior is not really fail-safe PEBCAK .... if operator does not obey these manual steps, VM's might be corrupted at least, there is a downtime.( not really expected for a HA cluster)

to be honest, I think "by design" should be a controlled behavior, migrating active resources and then initiate host reboot.

I just tested this stetting my all group to different priority for each cluster node ... and ... whisper reboot without migration, Guest had downtime !
on a controlled reboot, the vms will be gracefully shutdown, so no corruption should occur

with your test: did you wait until the ha-manager migrated all vms away? this is not instantaneous
 
Dominik,

i suppose you read me wrong :)

Scenario: operator on Node4 made some upgrades, requiring reboot... and he hits either in cli console a reboot or on Gui a restart of node.

this guy totally forgot some vital vm's are still running on this node.

this results not in a gracefully shutdown of vm but in a kill of this resource. so a reboot sequence should migrate ha resource automatically away !
not shutting them down (gracefully or not !) in this case happens literally a power off for this vm instance !

restart of cluster node 4 with a HA VM 2008r2 (vmid:101) (all drivers installed for balloning, virtio....)
Code:
 cat /etc/pve/.clusterlog
{
"data": [
{"uid": 393, "time": 1500897863, "pri": 6, "tag": "pvedaemon", "pid": 54046, "node": "pve02", "user": "root@pam", "msg": "starting task UPID:pve02:00007759:03AE4188:5975E247:vncproxy:101:root@pam:"},
{"uid": 5, "time": 1500897840, "pri": 6, "tag": "pve-ha-lrm", "pid": 5649, "node": "pve04", "user": "root@pam", "msg": "end task UPID:pve04:00001612:00004279:5975E22F:qmstart:101:root@pam: OK"},
{"uid": 4, "time": 1500897839, "pri": 6, "tag": "pve-ha-lrm", "pid": 5649, "node": "pve04", "user": "root@pam", "msg": "starting task UPID:pve04:00001612:00004279:5975E22F:qmstart:101:root@pam:"},
{"uid": 3, "time": 1500897829, "pri": 6, "tag": "pve-manager", "pid": 5603, "node": "pve04", "user": "root@pam", "msg": "end task UPID:pve04:000015E6:00003E9E:5975E225:startall::root@pam: OK"},
{"uid": 2, "time": 1500897829, "pri": 6, "tag": "pve-manager", "pid": 5603, "node": "pve04", "user": "root@pam", "msg": "starting task UPID:pve04:000015E6:00003E9E:5975E225:startall::root@pam:"},
{"uid": 392, "time": 1500897822, "pri": 6, "tag": "pvedaemon", "pid": 54048, "node": "pve02", "user": "root@pam", "msg": "successful auth for user 'root@pam'"},
 

Attachments

  • not_gracefull.PNG
    not_gracefull.PNG
    40.7 KB · Views: 22
this works here without problems,
you should see a shutdown task for each ha resource configured

you can try if the normal shutdown procedure works for this vm (with the shutdown button for example)

also note that a shutdown initiated from ha has a timeout of 120 seconds, maybe we should make this configurable
 
The community has asked for this feature many times without much luck. Would be great if the dev's would listen to the end user as there are alot of us requesting this functionality. There is still an open feature request I put in on this that has had no attention that I know of.
 
The community has asked for this feature many times without much luck. Would be great if the dev's would listen to the end user as there are alot of us requesting this functionality. There is still an open feature request I put in on this that has had no attention that I know of.

adam , thx for jumping in :)

My personal opinion is: this is a must, otherwise HA is totally meaningless and has no real functionality.
hopefully other users will also heavy vote for this !
 
  • Like
Reactions: Le PAH
adam , thx for jumping in :)

My personal opinion is: this is a must, otherwise HA is totally meaningless and has no real functionality.
hopefully other users will also heavy vote for this !

I agree 100%. This is how ha behaved in proxmox 2 and proxmox 3. Proxmox 4/5 they decided the freeze route is better. I don't agree with their logic at all but we are at their mercy.
 
I agree 100%. This is how ha behaved in proxmox 2 and proxmox 3. Proxmox 4/5 they decided the freeze route is better. I don't agree with their logic at all but we are at their mercy.

I want to buy a commercial subscription for my cluster... but I wait till unsolved questions are answered and solved.

Hopefully staff is passing these requests into the decision pipe :)
 
  • Like
Reactions: Le PAH
The community has asked for this feature many times without much luck. Would be great if the dev's would listen to the end user as there are alot of us requesting this functionality. There is still an open feature request I put in on this that has had no attention that I know of.

We do listen to our community. You also submitted the feature request for this and we work on such an improvement.

https://bugzilla.proxmox.com/show_bug.cgi?id=1378
 
We do listen to our community. You also submitted the feature request for this and we work on such an improvement.

https://bugzilla.proxmox.com/show_bug.cgi?id=1378

As I stated, I submitted a request for this but so far there has been no traction at all in over 2 months on a extremely important issue to us and a number of other users. IMO this is something that the community should have weighed in on before just making the change. That is the best way to listen to your community.
 
  • Like
Reactions: majorgear
also note that a shutdown initiated from ha has a timeout of 120 seconds, maybe we should make this configurable

How it will handle with systemd, when it has some default timers at 90+ seconds? In view of VMs and Proxmox host.
 
+1 for this option, would be ideal to automatically migrate VM's when initiating a manual node reboot.
 
+1 for this option, would be ideal to automatically migrate VM's when initiating a manual node reboot.

I agree, but at this point my hopes are slim. I will be forced to come up with my own solution as the dev's just make changes to logic without much input from the community. This thread should be more than enough for them to know this option should be available for the user to make their own decision.
 
Here's the solution for this, according to a page that was linked by @adamb
I just confirmed it working, and the change didn't require a reboot (well, I rebooted to test the HA, but the test VM migrated successfully, with no "after config" reboots)

Thomas Lamprecht 2019-01-07 14:02:28 CET
A fix for this was packaged with pve-ha-manager in version 2.0-6, additionally pve-cluster in version 5.0-33 is required, the packages should be available in public repositories soon. With those two packages you can add a settings line like:

ha: shutdown_policy=failover

to /etc/pve/datacenter.cfg
see man datacenter.cfg for more details.
 
  • Like
Reactions: mnih
Thanks for making this at least an option... but it still doesn't work. I've added it into /etc/pve/datacenter.cfg and waited for the config to fully sync, but reboots still just shutdown the VMs instead of migrating them.

Still though, why is this not the default?? What is the scenario where someone would prefer to have all VMs on a host shutdown rather than migrate to another host? And in those scenarios, that is substantially MORE common than those scenarios that prefer to have all VMs stay running?? I am quite doubtful that is the case...
 
Last edited:
you can change that in webgui nowadays (the thread is from 2019).

DataCenter > Options > HA Settings > Shutdown Policy < i have set it to migrate and all my HA VMs migrate to the other node, when i shutdown / restart the node for maintenance.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!