node restart and automatically migrate VM's (HA)

Gerhard W. Recher · Jul 24, 2017

Hi Folks,

I for any reason a Node will perform a (controlled) restart... no one of running VM's are migrated to best effort free node on cluster, even VM is in a HA group !

I think this is not the behavior intended by HA .

am I wrong or may I have missed some vital information's ?

regards

Gerhard

dcsapak · Jul 24, 2017

hi,

this is by design.

on a controlled reboot, we do not touch the ha resource until the host comes up.
we plan to make this configurable in the future

for now, if you want to move the ha vms before a reboot, use the bulk migrate option, or change the node priorities in the ha group temporarily

Gerhard W. Recher · Jul 24, 2017

Dominik,

but this behavior is not really fail-safe PEBCAK .... if operator does not obey these manual steps, VM's might be corrupted at least, there is a downtime.( not really expected for a HA cluster)

to be honest, I think "by design" should be a controlled behavior, migrating active resources and then initiate host reboot.

I just tested this stetting my all group to different priority for each cluster node ... and ... whisper reboot without migration, Guest had downtime !

Code:

nodes: <node>[:<pri>]{,<node>[:<pri>]}*

List of cluster node members, where a priority can be given to each node. A resource bound to a group will run on the available nodes with the highest priority. If there are more nodes in the highest priority class, the services will get distributed to those nodes. The priorities have a relative meaning only.

dcsapak · Jul 24, 2017

Gerhard W. Recher said:
but this behavior is not really fail-safe PEBCAK .... if operator does not obey these manual steps, VM's might be corrupted at least, there is a downtime.( not really expected for a HA cluster)

to be honest, I think "by design" should be a controlled behavior, migrating active resources and then initiate host reboot.

I just tested this stetting my all group to different priority for each cluster node ... and ... whisper reboot without migration, Guest had downtime !

on a controlled reboot, the vms will be gracefully shutdown, so no corruption should occur

with your test: did you wait until the ha-manager migrated all vms away? this is not instantaneous

Gerhard W. Recher · Jul 24, 2017

Dominik,

i suppose you read me wrong

Scenario: operator on Node4 made some upgrades, requiring reboot... and he hits either in cli console a reboot or on Gui a restart of node.

this guy totally forgot some vital vm's are still running on this node.

this results not in a gracefully shutdown of vm but in a kill of this resource. so a reboot sequence should migrate ha resource automatically away !
not shutting them down (gracefully or not !) in this case happens literally a power off for this vm instance !

restart of cluster node 4 with a HA VM 2008r2 (vmid:101) (all drivers installed for balloning, virtio....)

Code:

 cat /etc/pve/.clusterlog
{
"data": [
{"uid": 393, "time": 1500897863, "pri": 6, "tag": "pvedaemon", "pid": 54046, "node": "pve02", "user": "root@pam", "msg": "starting task UPID:pve02:00007759:03AE4188:5975E247:vncproxy:101:root@pam:"},
{"uid": 5, "time": 1500897840, "pri": 6, "tag": "pve-ha-lrm", "pid": 5649, "node": "pve04", "user": "root@pam", "msg": "end task UPID:pve04:00001612:00004279:5975E22F:qmstart:101:root@pam: OK"},
{"uid": 4, "time": 1500897839, "pri": 6, "tag": "pve-ha-lrm", "pid": 5649, "node": "pve04", "user": "root@pam", "msg": "starting task UPID:pve04:00001612:00004279:5975E22F:qmstart:101:root@pam:"},
{"uid": 3, "time": 1500897829, "pri": 6, "tag": "pve-manager", "pid": 5603, "node": "pve04", "user": "root@pam", "msg": "end task UPID:pve04:000015E6:00003E9E:5975E225:startall::root@pam: OK"},
{"uid": 2, "time": 1500897829, "pri": 6, "tag": "pve-manager", "pid": 5603, "node": "pve04", "user": "root@pam", "msg": "starting task UPID:pve04:000015E6:00003E9E:5975E225:startall::root@pam:"},
{"uid": 392, "time": 1500897822, "pri": 6, "tag": "pvedaemon", "pid": 54048, "node": "pve02", "user": "root@pam", "msg": "successful auth for user 'root@pam'"},

dcsapak · Jul 24, 2017

this works here without problems,
you should see a shutdown task for each ha resource configured

you can try if the normal shutdown procedure works for this vm (with the shutdown button for example)

also note that a shutdown initiated from ha has a timeout of 120 seconds, maybe we should make this configurable

adamb · Jul 24, 2017

The community has asked for this feature many times without much luck. Would be great if the dev's would listen to the end user as there are alot of us requesting this functionality. There is still an open feature request I put in on this that has had no attention that I know of.

Gerhard W. Recher · Jul 24, 2017

adamb said:
The community has asked for this feature many times without much luck. Would be great if the dev's would listen to the end user as there are alot of us requesting this functionality. There is still an open feature request I put in on this that has had no attention that I know of.

adam , thx for jumping in

My personal opinion is: this is a must, otherwise HA is totally meaningless and has no real functionality.
hopefully other users will also heavy vote for this !

adamb · Jul 24, 2017

Gerhard W. Recher said:
adam , thx for jumping in

My personal opinion is: this is a must, otherwise HA is totally meaningless and has no real functionality.
hopefully other users will also heavy vote for this !

I agree 100%. This is how ha behaved in proxmox 2 and proxmox 3. Proxmox 4/5 they decided the freeze route is better. I don't agree with their logic at all but we are at their mercy.

Gerhard W. Recher · Jul 24, 2017

adamb said:
I agree 100%. This is how ha behaved in proxmox 2 and proxmox 3. Proxmox 4/5 they decided the freeze route is better. I don't agree with their logic at all but we are at their mercy.

I want to buy a commercial subscription for my cluster... but I wait till unsolved questions are answered and solved.

Hopefully staff is passing these requests into the decision pipe

tom · Jul 25, 2017

adamb said:
The community has asked for this feature many times without much luck. Would be great if the dev's would listen to the end user as there are alot of us requesting this functionality. There is still an open feature request I put in on this that has had no attention that I know of.

We do listen to our community. You also submitted the feature request for this and we work on such an improvement.

https://bugzilla.proxmox.com/show_bug.cgi?id=1378

adamb · Jul 25, 2017

tom said:
We do listen to our community. You also submitted the feature request for this and we work on such an improvement.

https://bugzilla.proxmox.com/show_bug.cgi?id=1378

As I stated, I submitted a request for this but so far there has been no traction at all in over 2 months on a extremely important issue to us and a number of other users. IMO this is something that the community should have weighed in on before just making the change. That is the best way to listen to your community.

czechsys · Jul 25, 2017

dcsapak said:
also note that a shutdown initiated from ha has a timeout of 120 seconds, maybe we should make this configurable

How it will handle with systemd, when it has some default timers at 90+ seconds? In view of VMs and Proxmox host.

ben90818532 · Sep 22, 2017

+1 for this option, would be ideal to automatically migrate VM's when initiating a manual node reboot.

adamb · Oct 3, 2017

ben90818532 said:
+1 for this option, would be ideal to automatically migrate VM's when initiating a manual node reboot.

I agree, but at this point my hopes are slim. I will be forced to come up with my own solution as the dev's just make changes to logic without much input from the community. This thread should be more than enough for them to know this option should be available for the user to make their own decision.

Xabi · Jan 9, 2018

ben90818532 said:
+1 for this option, would be ideal to automatically migrate VM's when initiating a manual node reboot.

+1

rahul1985joshi · Dec 5, 2018

Hello Proxmoxians

+1 from my side as well surely there is no mean of HA if vms get rebooted

joshbgosh10592 · Oct 17, 2019

Here's the solution for this, according to a page that was linked by @adamb
I just confirmed it working, and the change didn't require a reboot (well, I rebooted to test the HA, but the test VM migrated successfully, with no "after config" reboots)

Thomas Lamprecht 2019-01-07 14:02:28 CET
A fix for this was packaged with pve-ha-manager in version 2.0-6, additionally pve-cluster in version 5.0-33 is required, the packages should be available in public repositories soon. With those two packages you can add a settings line like:

ha: shutdown_policy=failover

to /etc/pve/datacenter.cfg
see man datacenter.cfg for more details.

reukiodo · Apr 12, 2024

Thanks for making this at least an option... but it still doesn't work. I've added it into /etc/pve/datacenter.cfg and waited for the config to fully sync, but reboots still just shutdown the VMs instead of migrating them.

Still though, why is this not the default?? What is the scenario where someone would prefer to have all VMs on a host shutdown rather than migrate to another host? And in those scenarios, that is substantially MORE common than those scenarios that prefer to have all VMs stay running?? I am quite doubtful that is the case...

thearona · Apr 12, 2024

you can change that in webgui nowadays (the thread is from 2019).

DataCenter > Options > HA Settings > Shutdown Policy < i have set it to migrate and all my HA VMs migrate to the other node, when i shutdown / restart the node for maintenance.

node restart and automatically migrate VM's (HA)

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Attachments

Proxmox Staff Member

Famous Member

Well-Known Member

Famous Member

Well-Known Member

Proxmox Staff Member

Famous Member

Renowned Member

Active Member

Famous Member

New Member

New Member

New Member

New Member

New Member