It's been a year and 8 months since I made a post on the previous topic.
When you are passing through a PCIe or USB device regardless if you are using device mapping the current ha-manager only works if the situation is a node failure. More complicated features like maintenance mode or the brand new DRS mode cause Proxmox to continuously spawn failing migrate jobs. I want to be able to use all the features available.
Basic Migrations
Works perfectly, it identifies that the VM has a host device mapped to the VM and instructs the user that online migrations are not available for this VM. Attempting to migrate the VM is denied unless the VM has been shut down.
HA - Node Failure
Again, works perfectly for the situation. After the timeout period the VM boots again on another host following the device mapping configuration. Ensuring that the VM is always up.
HA - Maintenance Mode
Falls flat on it's face. If you don't remember to disable HA on the VM, the VM will attempt to live migrate despite the fact that it can't. The ha-manager process gets stuck in a loop repeatedly attempting and failing to move the VM until you disable HA and manually force the offline migration.
HA - Dynamic Scheduling
New feature also suffers from the same issues as maintenance mode. If Dynamic Scheduling deems that a VM that is not capable of live migrations be the VM to move, it gets stuck in a loop failing to move the VM until you disable/reenable HA.
Ideally HA needs to perform some basic checks to the VM prior to taking migration actions in maintenance mode or dynamic scheduling.
Not asking for live migration of VM's with host passthrough devices as I know there has to be significant work, just asking that:
1. Maintenance mode should shutdown a VM and migrate it away.
2. Dynamic Scheduling should move host passthrough VM's as a last resort.
I'd be fine if it was a simple flag on the VM that said I myself had to tick...
When you are passing through a PCIe or USB device regardless if you are using device mapping the current ha-manager only works if the situation is a node failure. More complicated features like maintenance mode or the brand new DRS mode cause Proxmox to continuously spawn failing migrate jobs. I want to be able to use all the features available.
Basic Migrations
Works perfectly, it identifies that the VM has a host device mapped to the VM and instructs the user that online migrations are not available for this VM. Attempting to migrate the VM is denied unless the VM has been shut down.
HA - Node Failure
Again, works perfectly for the situation. After the timeout period the VM boots again on another host following the device mapping configuration. Ensuring that the VM is always up.
HA - Maintenance Mode
Falls flat on it's face. If you don't remember to disable HA on the VM, the VM will attempt to live migrate despite the fact that it can't. The ha-manager process gets stuck in a loop repeatedly attempting and failing to move the VM until you disable HA and manually force the offline migration.
HA - Dynamic Scheduling
New feature also suffers from the same issues as maintenance mode. If Dynamic Scheduling deems that a VM that is not capable of live migrations be the VM to move, it gets stuck in a loop failing to move the VM until you disable/reenable HA.
Ideally HA needs to perform some basic checks to the VM prior to taking migration actions in maintenance mode or dynamic scheduling.
Not asking for live migration of VM's with host passthrough devices as I know there has to be significant work, just asking that:
1. Maintenance mode should shutdown a VM and migrate it away.
2. Dynamic Scheduling should move host passthrough VM's as a last resort.
I'd be fine if it was a simple flag on the VM that said I myself had to tick...