Slow livemigration performance since 5.0

the package will transition to pve-no-subscription soon (and then to pve-enterprise, as usual).

we will see how we can further improve the downtime for the local storage variant, but it will always have more downtime than the plain memory migration.
 
  • Like
Reactions: Sralityhe
Fabian, thank you very much for rewriting migration process.

although i still have a couple of concerns.
Can you explain why you not resume VM on target before migration finished? As i understand, it is safe to resume VM during or before migration and let Qemu to finish or abort the process. More than that, it is exactly how qemu live migration intended to operate, AFAIU.
Do you or your collegues discussed this case with upstream (qemu project)?

Concerning live migration with local disks, i not completely understand how it works, but IMO this issue has to be raised upstream too. I did not found any documentation or discussions about what is the most effective and safe way to organize storage and memory simultaneous migration and how soon can the target VM be started. 12 seconds is simply too much fo process that they call live migration.
 
Last edited:
Fabian, thank you very much for rewriting migration process.

although i still have a couple of concerns.
Can you explain why you not resume VM on target before migration finished? As i understand, it is safe to resume VM during migration and let Qemu to finish or abort the process. More than that, it is exactly how qemu live migration intended to operate, AFAIU.

PVE has a very strict concept of a node "owning" a VM as long as the config file is on the respective node. this allows us to use local locking on the owning node for most operations. we don't want to have a running VM on the target node but still have the config file on the source node, because that would violate this basic concept. instead, we move the config file during the downtime and then resume on the target node. this is a tradeoff between consistency and downtime.

Do you or your collegues discussed this case with upstream (qemu project)?

I don't see what there would be to discuss?

Concerning live migration with local disks, i not completely understand how it works, but IMO this issue has to be raised upstreeam too. I did not found any documentation or discussions about what is the most effective and safe way to organize storage and memory simultaneous migration and how soon can the target VM be started.

I think the only upstream intended way to do this is COLO, which is a lot more than just live-migration. but like I said, figuring out how to further improve downtime for migration with local-storage and also the insecure migration variant (which only partially benefits from the recent improvements) is on our agenda.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!