HA Migration of VMs using local devices

n1nj4888

Well-Known Member
Jan 13, 2019
162
24
58
45
Hi There,

I've seen this raised a couple of years back (https://forum.proxmox.com/threads/migrate-vm-with-local-devices.17369/#post-88296) and I'm wondering what is the best / recommended method for migrating a VM (with Intel IGP passthrough enabled) from one node to another?

Currently, my VM has the Intel IGP passed-through to the VM (hostpci0: 00:02.0) but I can't migrate either online or offline when using HA. I have a 2-node cluster with a 3rd qdevice for quorum voting... The two hosts are not identical but both have a similar Intel IGP (with the same local Device ID of 00:02.0) and the VM itself is able to run on both nodes with the Intel IGP passed-through and working successfully...

The error I get whenever I try to migrate manually / via HA is always:

Code:
task started by HA resource agent
2019-06-19 12:52:12 ERROR: migration aborted (duration 00:00:00): can't migrate VM which uses local devices: hostpci0
TASK ERROR: migration aborted

I get the error above if:
(A) VM is running on Node 2 and I change the HA profile to "Prefer Node 1"
(B) VM is stopped on Node 2 and I click "Migrate"

The only way I seem to be able to migrate the VM between Node 1 and Node 2 is to change the HA state to "Ignored", stop the VM, remove the PCI-passthrough for the Intel IGP from the VM configuration, and manually migrate it to the new Node and then re-enable the passthrough but this seems overly laborious.

I understand that the running VM may not be able to be migrated between Nodes (given the state of the underlying physical Intel IGP?) but, when the VM is stopped, I should be able to migrate it offline without having to remove/re-add the passed-through device?

Would it be possible to add a "Forced" option to GUI to allow for the VM to be either (1) Migrated online (if at all possible, with minimal downtime), or (2) To stop the VM, migrate it to the second Node and start the VM again on the new Node?
 
The only way I seem to be able to migrate the VM between Node 1 and Node 2 is to change the HA state to "Ignored", stop the VM, remove the PCI-passthrough for the Intel IGP from the VM configuration, and manually migrate it to the new Node and then re-enable the passthrough but this seems overly laborious.

I.e., HA has nothing to do with this as to make it work you always need to stop and remove it, so this is the normal migration code path, independent of HA (which uses even the same one, indirectly).

So a live migration is as of now just not possible. The whole internal state of your passed-through device would need to be transferred, and currently that is not possible, PCI(e) devices have internal registers, memory, ... which all needs to be migrated exactly as is. There are some ideas and projects to make this maybe happen sometimes in the future, but I'd guess that even once this arrives it will be quite a bit brittle and unreliable initially, or working only for a limited count of specific models, or maybe even only for para-virtualization..

Offline migration of a stopped VM should really work, though...
 
Offline migration of a stopped VM should really work, though...

Thanks for this - Are you saying that it should work (i.e. its a bug or a new feature that would consider fixing / implementing) to migrate the VM offline? I think it would be very useful and stop a lot of manual steps if I could simply click "Migrate" and then an "Offline option" which would warn, then shut the VM down (if already running), migrate it to the new selected node and start it again if that was the previous state...

It would be good if this could be automated via HA also in that, if the running node is down, the VM could simply be started on the selected HA nodes (i.e. those which are selected as failover nodes with support for the device to be passthrough'd)...

Thanks!
 
Thanks for this - Are you saying that it should work (i.e. its a bug or a new feature that would consider fixing / implementing) to migrate the VM offline?

It should work offline, it's a bug if not.

It would be good if this could be automated via HA also in that, if the running node is down, the VM could simply be started on the selected HA nodes (i.e. those which are selected as failover nodes with support for the device to be passthrough'd)...

That's already done. If a node is down and the other one is still quorate the VM under HA should always be moved (if HA group settings allow it) and started on the other node.
The start can then still naturally fail if a device is not available in the other node, but that wouldn't be the case in your setup.
 
It should work offline, it's a bug if not.

Thanks for the response - Given that the Migrate when stopped doesn't work with the hostpci error, is there anything I have to do to log this as a bug that can be fixed?

In terms of the HA failover, I've just tested that and, if Node 1 (Preferred) goes down, it does successfully start the VM on Node 2. The issue then is that, if Node 1 (Preferred) comes back online, the HA Migrate fails with the hostpci error and I have to stop the VM, remove device/migrate (manually or auto via the existing HA profile) / re-add device dance - It would be much better if the migration (HA or manual) could be done offline without having to remove the hostpci device and re-add later - even if this means I manually have to shut the VM down on Node 2 before the migration (HA or otherwise) handles the failback to Node 1 successfully...

Thanks!
 
Last edited:
The issue then is that, if Node 1 (Preferred) comes back online, the HA Migrate fails with the hostpci error and I have to stop the VM

You may want to add the "nofailback" option to this HA group configuration then it keeps it on the node it recovered too as long as possible (i.e., until a manual move to the next or a failure of that node).

It would be much better if the migration (HA or manual) could be done offline without having to remove the hostpci device and re-add later - even if this means I manually have to shut the VM down on Node 2 before the migration (HA or otherwise) handles the failback to Node 1 successfully...
yes true, that should be fixed if the mentioned bug is fixed.

Maybe you could open a bug report at: https://bugzilla.proxmox.com/ maybe linking this thread in there is a good idea too, that'd be great, thanks!
 
Hello,

sorry to resurrect this thread - but it appears that since then proxmox still doesnt support to migrate a VM with hostpci (SR-IOV) (network) devices.
However there have been changes to libvirt - there is a option called teaming which sounds like a way to get this working. There is a nice detailed explanation about it here:
https://blog.flipkart.tech/live-migration-of-a-vm-with-sr-iov-vf-passthrough-device-a6ef1f702fbf

Basically it "teams"/bonds the paravirt. interface and the SR-IOV interface together for the VM (only 1 is active at a time). During migration the SR-IOV is removed and paravirt. takes over - after migration SR-IOV is brought up again).

I'd really like to see something like this in proxmox (given the assumption that all nodes have the same hardware and have the same VF of network interfaces, which can be assumed true in a real HA cluster). SR-IOV gives just a very notable performance boost (like 50% in my tests) compared to paravirtual and should be standard when supported.
 
  • Like
Reactions: Brandito