ha-manager and VM's with host passthrough

Denary

New Member
Sep 21, 2024
4
0
1
It's been a year and 8 months since I made a post on the previous topic.
When you are passing through a PCIe or USB device regardless if you are using device mapping the current ha-manager only works if the situation is a node failure. More complicated features like maintenance mode or the brand new DRS mode cause Proxmox to continuously spawn failing migrate jobs. I want to be able to use all the features available.

Basic Migrations
Works perfectly, it identifies that the VM has a host device mapped to the VM and instructs the user that online migrations are not available for this VM. Attempting to migrate the VM is denied unless the VM has been shut down.

HA - Node Failure
Again, works perfectly for the situation. After the timeout period the VM boots again on another host following the device mapping configuration. Ensuring that the VM is always up.

HA - Maintenance Mode
Falls flat on it's face. If you don't remember to disable HA on the VM, the VM will attempt to live migrate despite the fact that it can't. The ha-manager process gets stuck in a loop repeatedly attempting and failing to move the VM until you disable HA and manually force the offline migration.

HA - Dynamic Scheduling
New feature also suffers from the same issues as maintenance mode. If Dynamic Scheduling deems that a VM that is not capable of live migrations be the VM to move, it gets stuck in a loop failing to move the VM until you disable/reenable HA.

Ideally HA needs to perform some basic checks to the VM prior to taking migration actions in maintenance mode or dynamic scheduling.
Not asking for live migration of VM's with host passthrough devices as I know there has to be significant work, just asking that:

1. Maintenance mode should shutdown a VM and migrate it away.
2. Dynamic Scheduling should move host passthrough VM's as a last resort.

I'd be fine if it was a simple flag on the VM that said I myself had to tick...
 

Attachments

  • Screenshot_20260503_224117_Chrome.jpg
    Screenshot_20260503_224117_Chrome.jpg
    32.2 KB · Views: 2
  • Screenshot_20260503_224117_Chrome.jpg
    Screenshot_20260503_224117_Chrome.jpg
    32.2 KB · Views: 2
Hi!

Thanks for the feedback!

Ideally HA needs to perform some basic checks to the VM prior to taking migration actions in maintenance mode or dynamic scheduling.
Not asking for live migration of VM's with host passthrough devices as I know there has to be significant work, just asking that:

1. Maintenance mode should shutdown a VM and migrate it away.
2. Dynamic Scheduling should move host passthrough VM's as a last resort.

I'd be fine if it was a simple flag on the VM that said I myself had to tick...
Generally, an HA resource must be able to run on any node and also be migratable to any node. Otherwise, one needs to encode these constraints in HA rules. I've sent in a patch [0] to add notes about this so that it is more obvious as it was rather an implicit requirement before. External feedback on the documentation patch is very welcome so it is understandable to you and other users, which run into this!

Otherwise, there is a Bugzilla entry for this [1]. Ideally, as also stated in that Bugzilla entry, it should be controllable whether an HA resource is live migrated or relocated offline to another node. Feel free to put yourself into the CC there to get updates on the status of that Bugzilla entry.

[0] https://lore.proxmox.com/pve-devel/20260504111159.183163-3-d.kral@proxmox.com/
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=6253