migrate HA resources before rebooting a node

gui76

New Member
Oct 3, 2019
3
0
1
44
Hi,

I've just added a third node to my Proxmox 6 cluster to have HA enabled for my critical VMs, and therefore be able to do maintenance on nodes without having to worry about downtimes. Or this is what I thought I would have, rather ....

The current HA implementation is not HA: it's quick failover. There is a downtime, and I think it's a huge problem.

I don't think I'm the only one. I've seen posts from 2017 already requesting the exact same thing.
https://forum.proxmox.com/threads/node-restart-and-automatically-migrate-vms-ha.35882/

I've seen the feature request
https://bugzilla.proxmox.com/show_bug.cgi?id=1378

which led to the implementation of the shutdown_policy switch.
But this is still not HA at all.

I chose shutdown_policy = failover

I intentionally reboot a node for maintenance, to upgrade packages on it.
All VMs and containers on that host are switched off, so they are all OFFLINE. At this point it's already game over.
Some seconds later, the ones I declared as HA resources are started on another node.
This is NOT why someone would setup a cluster for, IMO.

I (of course!) want all the HA resources on that node to be migrated to any other node BEFORE all the other (non-HA) resources on that node are switched off.
But don't switch off all of them first!
If I take the time to declare some resources as HA, it's because I mean it: I don't want them to go down (as much as it can be avoided. If the node crashes, of course, it's another story)

I read in those old posts that the user should bulk migrate all the resources on the node if he wants to achieve that result.
What's the point of HA then, if the user has to do some manual actions?
And I don't want to migrate ALL resources. The other nodes might not have the capacity to take them all.
But the resources I explicitly declared as HA, the ones which are the most important to me, those I of course want to have them online all the time.

The ability to do maintenance on nodes is one of the top reasons why one would deploy a cluster of hypervisor nodes.

Can we have an option to "live migrate HA resources before node reboot/shutdown" please?

Thanks in advance.
 
The current HA implementation is not HA: it's quick failover. There is a downtime, and I think it's a huge problem.

It is HA. The current situation assumes that if you actively shutdown or reboot a node your moved all relevant VMs (HA or not) to other nodes if you want to keep them running without interruption, there's even a bulk migrate to make this quite easy as a workaround. A node failure in any way is currently just not seen in the same way like a human triggered power off or reboot.

Can we have an option to "live migrate HA resources before node reboot/shutdown" please?
Yes there are plans to do that.
 
Thanks for the feedback, and VERY glad to know there are plans to offer such an option.
Any ticketing system I can subscribe to, to track progress?

Thanks
 
Thanks for the feedback, and VERY glad to know there are plans to offer such an option.

This has also been discussed on the forums and there cannot be a general good-for-all option, unfortunately. This option has (hopefully) be configured and is per default off. One big problem is e.g. if the receiving server (or servers) has not enough RAM and will crash if you move your VMs there.
 
This has also been discussed on the forums and there cannot be a general good-for-all option, unfortunately. This option has (hopefully) be configured and is per default off. One big problem is e.g. if the receiving server (or servers) has not enough RAM and will crash if you move your VMs there.

The VMs and LXCs needs to be migrated to the other nodes in the cluster. Not only to one node. I would like to have a feature like DRS in vmware to balance the cluster. I need to have enogh resources to serve the resources of one faulty node. In our setup we use less than 50% of our cluster to be able to do a failover to the other datacenter.
 
This has also been discussed on the forums and there cannot be a general good-for-all option, unfortunately. This option has (hopefully) be configured and is per default off. One big problem is e.g. if the receiving server (or servers) has not enough RAM and will crash if you move your VMs there.

Yes, configuration and opt-out/in is definitively planned.
 
The VMs and LXCs needs to be migrated to the other nodes in the cluster. Not only to one node. I would like to have a feature like DRS in vmware to balance the cluster. I need to have enogh resources to serve the resources of one faulty node. In our setup we use less than 50% of our cluster to be able to do a failover to the other datacenter.

We already balance VMs CTs out in the cluster, just that there are no fancy metrics but simply the running service count which gets balanced, works good if the host and services are homogeneous, not so good if the vary a lot. But that's all known. There's
https://bugzilla.proxmox.com/show_bug.cgi?id=2115
https://bugzilla.proxmox.com/show_bug.cgi?id=2181
 
Thanks a lot Thomas for the links.
Hopefully those tickets will get some traction.
IMHO this is one of the few fundamentals, if not the only one, lacking a bit at the moment for Proxmox to really be an enterprise-grade viable solution
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!