To get a feeling, how many VMs are we talking about here?
About 500 or so. The cluster contains seven nodes, but most of the VMs are kept on five. We can go down to four in an emergency without disruption of service, if things are kept balanced. Something I've spent quite a few code-hours trying to automate, and it mostly works fine, but there are a few things which could allow me to make my code even better, such as being able to unset the migration state. Hooks for pre- and post-migration would be nice, too.
There is no internal queue in the HA manager for migrations. The HA manager uses a state machine for each service and once the state is set to migrate, it can only change after the migration either finished or failed, because the migration task is already running.
And there is no way to make it fail before the migration starts? That's a shame. Is there a way to make migrations auto-fail, then? Some hooks would have been a welcome feature here – adding /bin/false to a pre-migration hook would have sidestepped the problem.
To play the devil's advocate for a bit, what if the services wouldn't have been relocated by bulk migration, but because of a hardware failure?
That's a very different scenario. I'm assuming HA is clever enough not to send all the VMs to one and the same node in the event of a hardware failure, but to use the priority numbers in the HA groups and spread evenly across nodes with the same priority number. With a bulk migration, however, someone has explicitly told HA to migrate to one specific node. This obviously shouldn't happen, but when we're training new crew things happen. It would be nice to be able to add an arrow to that state machine of yours.
The potential of a node crashing is just one of the issues behind a bulk migration, however. We have a few Cisco ASAv VMs which network interfaces straight up and die when they are live migrated, so those we can't move at all – but if some newbie has managed to initiate a bulk migration, we'd have to babysit Proxmox and press cancel when a migration of such a VM comes up. Again, some hooks would have come in handy here.
(One would think that HA groups would come in handy here, but HA apparently has no objections at all migrating a VM outside its HA group, although it will migrate it back a few moments later.)