So this topic comes up every now and then, but there seems to be no absolute "yes" or "no" regarding if live migration between Intel and AMD hosts works, or what the options should be. I hope our experience here will help others and spark some discussion.
Due to the unfortunate timing of Intel CPU vulnerabilities becoming an issue for hosting providers, our main service cluster's performance has been steadily getting worse (more load, less reserves available than predicted 2 years ago). Therefore we had to buy new hardware sooner than expected. After taking a long and hard look at current offers on the market, the choice was more or less clear. I am not looking for sparking a flame war regarding the hardware choice, lets just suffice to say that Intel was kind of out of the competition from the start.
We could of course have built a new cluster, but that would have been more hassle than worth at this time. Therefore we decided to add some new AMD servers alongside the old Intel ones and look at getting live migration of Linux VMs (Debian and Ubuntu) working smoothly for the situations when we'd need to migrate VMs across hardware boundaries. The old and new hardware is as follows:
Intel
kvm64 (default CPU selection in Proxmox) enables the following flags in a VM:
After some trial and error, the following combination was arrived at, which allowed for live migration of Linux VMs between Intel and AMD, and also avoided some bugs and crashes in the VMs:
This line will need to be added into the VM configuration text file in
Please note that according to our tests simply leaving the CPU type empty in the GUI (leading to the qemu command line argument of
Another consideration is that Intel CPU vulnerability mitigations will be enabled by default in a VM if booted with the default kernel command line options on Intel hardware and DISABLED on AMD hardware. To preserve the mitigations in any case after live migration (while leading to some performance loss on AMD hardware), kernel command line will need to be modified with at least the following:
Additional notes about some CPU flags:
Due to the unfortunate timing of Intel CPU vulnerabilities becoming an issue for hosting providers, our main service cluster's performance has been steadily getting worse (more load, less reserves available than predicted 2 years ago). Therefore we had to buy new hardware sooner than expected. After taking a long and hard look at current offers on the market, the choice was more or less clear. I am not looking for sparking a flame war regarding the hardware choice, lets just suffice to say that Intel was kind of out of the competition from the start.
We could of course have built a new cluster, but that would have been more hassle than worth at this time. Therefore we decided to add some new AMD servers alongside the old Intel ones and look at getting live migration of Linux VMs (Debian and Ubuntu) working smoothly for the situations when we'd need to migrate VMs across hardware boundaries. The old and new hardware is as follows:
Intel
- 2x Xeon E5-2630 v4 (Broadwell, 10-core)
- 768GB DDR4 2400MHz
- 2x EPYC 7502 (Epyc2, 32-core)
- 1TB DDR4 3200MHz
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
. Our goal was to optimize the virtual CPUs as much as possible so that we could get the most out of the hardware, not just to find the least amount of flags to enable for live migration to work. Our use case involves (among others) heavy use of SSL and other crypto, meaning we need to expose as much of the CPU feature flags as possible inside the VMs (crypto performance suffers greatly with the default flags). The following list is the intersection of CPU flags available both on AMD and Intel:3dnowprefetch abm adx aes aperfmperf apic arat avx avx2 bmi1 bmi2 cat_l3 cdp_l3 clflush cmov constant_tsc cpuid cqm cqm_llc cqm_mbm_local cqm_mbm_total cqm_occup_llc cx16 cx8 de f16c fma fpu fsgsbase fxsr ht lahf_lm lm mca mce mmx movbe msr mtrr nonstop_tsc nopl nx pae pat pclmulqdq pdpe1gb pge pni popcnt pse pse36 rdrand rdseed rdt_a rdtscp rep_good sep smap smep sse sse2 sse4_1 sse4_2 ssse3 syscall tsc vme xsave xsaveopt
kvm64 (default CPU selection in Proxmox) enables the following flags in a VM:
apic clflush cmov constant_tsc cpuid cpuid_fault cx16 cx8 de fpu fxsr ht hypervisor lm mca mce mmx msr mtrr nopl nx pae pat pge pni pse pse36 sse sse2 syscall tsc tsc_known_freq vme x2apic xtopology
After some trial and error, the following combination was arrived at, which allowed for live migration of Linux VMs between Intel and AMD, and also avoided some bugs and crashes in the VMs:
args: -cpu kvm64,+3dnowprefetch,+abm,+adx,+aes,+arat,+avx,+avx2,+bmi1,+bmi2,+f16c,+fma,+lahf_lm,+movbe,+pclmulqdq,+popcnt,+rdrand,+rdseed,+rdtscp,+sep,+smap,+smep,+sse4.1,+sse4.2,+ssse3,+xsave,+xsaveopt,+kvm_pv_eoi
This line will need to be added into the VM configuration text file in
/etc/pve/nodes/<nodename>/qemu-server/<vmid>.conf
. Also, the CPU selection will need to be left empty in the GUI (or the line beginning with "cpu: " removed from the VM configuration). It would be neat to have the option to modify the "args:" -parameter right in the GUI though...Please note that according to our tests simply leaving the CPU type empty in the GUI (leading to the qemu command line argument of
-cpu kvm64,+sep,+lahf_lm,+kvm_pv_unhalt,+kvm_pv_eoi,enforce
), while seemingly working at first, will after some load and idle time in the VM result in a crash involving kvm_kick_cpu
function somewhere inside of the paravirtualized halt/unhalt code. Linux kernels tested ranged from Debian's 4.9.210-1 to Ubuntu's 5.3.0-46 (and some in between). Therefore the Proxmox default seems to be unsafe and apparently the very minimum working command line probably would be args: -cpu kvm64,+sep,+lahf_lm,+kvm_pv_eoi
.Another consideration is that Intel CPU vulnerability mitigations will be enabled by default in a VM if booted with the default kernel command line options on Intel hardware and DISABLED on AMD hardware. To preserve the mitigations in any case after live migration (while leading to some performance loss on AMD hardware), kernel command line will need to be modified with at least the following:
pti=on spectre_v2=retpoline,generic spec_store_bypass_disable=seccomp
. Of course one could always define HA groups that do not cross CPU vendor boundaries and live with the additional management of that. It could well be that we will in the end go for this choice, but for now this configuration seems to do what we want it to do. YMMV of course.Additional notes about some CPU flags:
- Proxmox default: sep lahf_lm kvm_pv_unhalt kvm_pv_eoi enforce
- Accelerated maths: abm adx bmi1 bmi2 f16c fma movbe
- Crypto code: aes pclmulqdq popcnt
- SSE/AVX: 3dnowprefetch avx avx2 sse4.1 sse4.2 ssse3
- Hardware random numbers: rdrand rdseed
- Timers: arat rdtscp
- Supervisor Mode (will lead to crashes in some VMs if not present): smap smep