Live migration for guests on hosts of different CPU classes?

surfrock66

Active Member
Feb 10, 2020
38
8
28
40
I'm modernizing my environment and replacing some pretty old R710's with Intel(R) Xeon(R) X5687 CPU's with R6525's AMD EPYC 7252 CPU's. My VM's have all been set with the CPU type x86-64-v2-AES (v2 being the highest level my old Westmere CPUs support).

When I live migrate between the old and new hosts, my VM's lock up and need to be reset. I had done a bunch of reading and thought I understood that choosing kvm64 or x86-64-v2-AES would prevent that, but that is not the case.

I'm kind of stumped on my reading; I see a lot of stuff saying this should work, and a few places saying moving to kvm64 is better over x86-64-v2-AES. I'm having trouble gleaming an answer from the qemu docs and forum/reddit posts, and would like to get some opinions on how to best support live-migrating during this transition.
 
I have the same behavior here with 4 nodes. I can successfully move between the different CPU types within intel or amd online. But when I move e.g. an Intel to AMD or in reverse, it usually does not work. VM's crashes inside.
Sometimes it is even the case that the VM continues to run normally on the target host for 2-3 minutes, then it crashes.
Guest type does not matter, Windows and Linux VMs.

I can still remember that this worked with Qemu version 6. That was some time ago... :p
 
When u used qemu 6 then also with an older kernel? Right? Pin an older kernel in the cluster and try again!
 
All 3 nodes are "Linux 6.8.12-2-pve (2024-09-05T10:03Z)"

Can you expand a bit, are you suggesting it could be a kernel issue with an up-to-date kernel and I need to potentially downgrade the kernel? I don't have a target kernel version that is known good since I installed node3 with the new architecture directly on this version.
 
VMware world makes this easy.
Enhanced Version Control (EVC) won't let you break two rules.
  • You migrate to equal or lower hosts.
  • You do not jump cpu vendors.
If you try, it just doesn't work at all. Wizard stops there and tells you No.

Proxmox plays fast and loose with CPU.
It should not.
Or at least we should be able to tell it to not just assume a vmotion is gonna work out and do some checking first like esxi does.
 
All 3 nodes are "Linux 6.8.12-2-pve (2024-09-05T10:03Z)"

Can you expand a bit, are you suggesting it could be a kernel issue with an up-to-date kernel and I need to potentially downgrade the kernel? I don't have a target kernel version that is known good since I installed node3 with the new architecture directly on this version.
That's no problem. U can install from the repo. Search for "kernel".
 
The hosts have CPUs from the same vendor with similar capabilities. Different vendor might work depending on the actual models and VMs CPU type configured, but it cannot be guaranteed - so please test before deploying such a setup in production.
This is from the pve manual. U see this sounds similar to the VMware docu.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!