Live migration freezes between Intel and AMD

jimmy1987

Renowned Member
Dec 3, 2011
27
6
68
I might think this just doesn't work as it should but anyway.

When doing a live migration including storage from a E5-2640 v3 and a AMD EPYC 9534 it seems to work at first but then crashes the VM where it needs a reset in order to work again. My thought was to use the generic aes v2 cpu on the vm's so i could migrate between them.

Does it just not work or should i do something up front? I'm using proxmox 8.4 with kernel 6.8
 
If you use "host" as CPU type this is expected. For migration to work between different CPU Models and Vendors you have to use a generic CPU type like x86_64_v3 (has to be supported by all physical CPU's you want to migrate from / to).

here ist a list from qemu
 
Then it should work. Maybe some other VM setting is preventing the migration to succeed ...
 
Hmm not sure which one as it doesn't have much options enabled, I only know that I migrated 4 vm's and 3 of them just completely froze, 2 of them like 4 minutes after the migration.
Not sure if this has anything to do with it:
kvm: warning: TSC frequency mismatch between VM (2449998 kHz) and host (2596991 kHz), and TSC scaling unavailable
 
The cluster now only has an AMD and Intel machine, it will get additional machines added that have the exact same hardware config as the intel. I will try again with those hosts as well. Might be just something between intel and amd as well.
 
You can’t migrate (live) between AMD and Intel unless you have a common CPU set.

The only common CPU set according to QEMU documentation is qemu64 or qemu32.

The x86-64-v2-AES CPU you are pointing at will use more modern CPU instructions than a common x86 where possible, HOWEVER, note that in the past decade a lot has changed with this. Almost every modern CPU microcode and BIOS now has mitigations for security issues, this is implemented slightly different between the two CPU systems, because the security issues crop up on different instructions for each. So you start your VM on one node with a certain set of “common” instructions disabled and then migrate that to another node with a different set of “common” instructions disabled, then you eventually call that instruction from the VM.

You have to look specifically at the hardware, distill a set of common instructions and make changes to the profile accordingly.
 
That is what I thought as well and when it calls that instruction it will crash the VM, I even had one crash after like 5 hours after the migration. I'm guessing you can migrate between slightly different intel cpu's then? (the other servers that will join this cluster have the exact same hardware as the intel one, to the disks used) so that should not be so much of an issue.
 
  • Like
Reactions: Johannes S
You can migrate between quite different Intel CPU provided your kernels and microcode are up to date. Just set your CPU type to the oldest common architecture (eg CascadeLake if that is your oldest CPU). I migrate between Cascade Lake Xeon Silver and Sapphire Rapids Xeon Gold without a problem.
 
Out of curiosity what happens when you install the Intel and AMD microcode packages respectively? Does it change the behavior at all?