I run a frigate CCTV system on my two node cluster with high availability enabled. Obviously a live migration cannot take place and it's not the main issue I have.
Node Failure
In the event of a node failure, everything works perfectly. The VM migrates from the first to the second node using the mapping for the CoralTPU device on the second node. Perfect!
Node Maintenance
If I set the node maintenance flag 'ha-manager crm-command node-maintenance enable {node}' or if I have the reboot action set to 'migrate'; the VM tries to migrate live. Obviously with a PCIe device the migration action fails but then ha-manager just retries and gets stuck in a loop. I have to disable HA manually and migrate it myself.
I also found another weird bug where deleting the VM out of the HA manager left the node in a weird state where it's stopped most of it's pve/linux processes including ssh but left the vm up.
I feel like the migration action getting stuck is a bug... Is there a way to flag a virtual machine within ha to say "this virtual machine cannot live migrate. Shut it down, migrate, spin back up". Or if not is there a suggestion box somewhere?
Node Failure
In the event of a node failure, everything works perfectly. The VM migrates from the first to the second node using the mapping for the CoralTPU device on the second node. Perfect!
Node Maintenance
If I set the node maintenance flag 'ha-manager crm-command node-maintenance enable {node}' or if I have the reboot action set to 'migrate'; the VM tries to migrate live. Obviously with a PCIe device the migration action fails but then ha-manager just retries and gets stuck in a loop. I have to disable HA manually and migrate it myself.
I also found another weird bug where deleting the VM out of the HA manager left the node in a weird state where it's stopped most of it's pve/linux processes including ssh but left the vm up.
I feel like the migration action getting stuck is a bug... Is there a way to flag a virtual machine within ha to say "this virtual machine cannot live migrate. Shut it down, migrate, spin back up". Or if not is there a suggestion box somewhere?