Virtual Machines freeze just after a live migration

May 19, 2022
4
0
1
Dear All,

We have 4 Proxmox VE in a cluster using the sharing storage capacity of Ceph. Each Proxmox VE is different concerning the processor, RAM, performance, and storage.

Everything seems to work properly but the live migration. This is not a constant behavior, but 1 time to 3 approximately, the migrated VM freeze on the targeted host after the migration process.

This behavior is observed on each guest's OS (different Linux versions and Windows) and from / to each host.

Thank you for your help.
 
Last edited by a moderator:
Hello,

Thank you for your help.

Code:
fr-val-proxmox01-mgt
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          24
On-line CPU(s) list:             0-23
Thread(s) per core:              2
Core(s) per socket:              12
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
Stepping:                        4
CPU MHz:                         3200.000
CPU max MHz:                     3200.0000
CPU min MHz:                     1000.0000
BogoMIPS:                        4600.00
Virtualization:                  VT-x
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        12 MiB
L3 cache:                        16.5 MiB
NUMA node0 CPU(s):               0-23
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT vulnerable
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d
Warning: untrusted X11 forwarding setup failed: xauth key data not generated

fr-val-proxmox02-mgt
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          24
On-line CPU(s) list:             0-23
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           62
Model name:                      Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
Stepping:                        4
CPU MHz:                         2600.000
CPU max MHz:                     2600.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4200.42
Virtualization:                  VT-x
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        3 MiB
L3 cache:                        30 MiB
NUMA node0 CPU(s):               0-5,12-17
NUMA node1 CPU(s):               6-11,18-23
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts
Warning: untrusted X11 forwarding setup failed: xauth key data not generated

fr-val-proxmox03-mgt
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          24
On-line CPU(s) list:             0-23
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           63
Model name:                      Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping:                        2
CPU MHz:                         3200.000
CPU max MHz:                     3200.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4789.04
Virtualization:                  VT-x
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        3 MiB
L3 cache:                        30 MiB
NUMA node0 CPU(s):               0-5,12-17
NUMA node1 CPU(s):               6-11,18-23
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

fr-val-proxmox04-mgt
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Stepping:                        7
CPU MHz:                         2300.000
BogoMIPS:                        4600.00
Virtualization:                  VT-x
L1d cache:                       1 MiB
L1i cache:                       1 MiB
L2 cache:                        32 MiB
L3 cache:                        44 MiB
NUMA node0 CPU(s):               0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
NUMA node1 CPU(s):               1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; TSX disabled
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
 
Ok, thank you.

So, if the origin is the old CPU, the freeze should not happen during migration between my maned host fr-val-proxmox01-mgt and fr-val-proxmox04-mgt ?

Here is a snapshot of an example of CPU configuration. The only think variable between the VMs is the number of cores. No extra CPU Flags enable, the advanced is only ticked to see the options. Thank you !

Screenshot 2022-05-19 at 15.03.56.png
 
I checked the migration and everything seems to work correctly between the hosts with the same CPU.
This is good news !
Are there some CPU options to use in guests to overcome this migration issue between hosts with no common CPU models ?
 
proxmox Virtual Environment 7.2-11
live migration
Xeon(R) Bronze 3106 <<--->> Xeon(R) CPU E5-2650 v2
kernel 5.15

there was a problem
when migrating Xeon(R) CPU E5-2650 v2 -->> Xeon(R) Bronze 3106 - everything is ok, VM is working
when migrating Xeon(R) Bronze 3106 -->> Xeon(R) CPU E5-2650 v2 - VM freezes, requires full VM shutdown and power on

installed from pve-kernel-5.19 repository
# apt install pve-kernel-5.19
after restarting both PVE servers they are running on pve-kernel-5.19.7-1-pve
and now
when migrating Xeon(R) CPU E5-2650 v2 -->> Xeon(R) Bronze 3106 - everything is ok, VM works and does not freeze
when migrating Xeon(R) Bronze 3106 -->> Xeon(R) CPU E5-2650 v2- everything is ok, VM is running and not freezing
 
Same issue with two different CPUs

Kernel on both servers :
Linux 5.15.102-1-pve #1 SMP PVE 5.15.102-1 (2023-03-14T13:48Z)

CPUS:
Intel(R) Xeon(R) Silver 4309Y CPU @ 2.80GHz
Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz

From E5-2670 to Silver - no issues
From Silver to E5-2670 - VMs are freezing instantly

Since both are having the same Kernel running, this can't be the problem here...
 
Since both are having the same Kernel running, this can't be the problem here...

It certainly can be - 5.15 is known for causing issues with live-migration between Intel CPUs of different architectures. Please try one of our opt-in kernels and see if the problem persists.

Additionally, make sure if you are using a specific/custom CPU model that its instruction set corresponds to the 'lowest common denominator' of all the processors you are using.
 
  • Like
Reactions: pvps1
Yeah. we switched to Kernel 5.19 and the problem is solved. Live migration works.
Thanks!
Great, maybe it might be worthwhile to switch to 6.1 (subscription) / 6.2 (no-subscription) instead - since 5.19 is already outdated and will quite possibly not get any further updates from us.
 
6.1 (subscription) / 6.2 (no-subscription)
Brave as I am I've this on my production cluster from enterprise repo since this morning:
Code:
~# apt policy pve-kernel-6.2
pve-kernel-6.2:
  Installed: 7.3-8
  Candidate: 7.3-8
  Version table:
 *** 7.3-8 500
        500 https://enterprise.proxmox.com/debian/pve bullseye/pve-enterprise amd64 Packages

~# pveversion
pve-manager/7.4-3/9002ab8a (running kernel: 6.2.6-1-pve)

:)
 
It certainly can be - 5.15 is known for causing issues with live-migration between Intel CPUs of different architectures. Please try one of our opt-in kernels and see if the problem persists.
Sorry for reviving this old thread, but I'm currently facing the same issue. Is this recognized as a bug that's planned to be addressed, or is the recommended solution to install the optional 6.2 kernel?
 
That specific problem had been solved.

Meanwhile the current default kernel is Linux pvea 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64 GNU/Linux, so you might skip 6.2 completely.

Best regards
 
  • Like
Reactions: Rudo
That specific problem had been solved.

Meanwhile the current default kernel is Linux pvea 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64 GNU/Linux, so you might skip 6.2 completely.

Best regards
Does this apply to Proxmox 7 as well? I'm currently using the latest patched version, 7.4-17, which includes kernel
Code:
5.15.131-2-pv
.
 
I had the same problem with proxmox v7.4-17 and kernel 5.15:
serv1 : Intel(R) Xeon(R) CPU E5-4620
serv2 : Intel(R) Xeon(R) Silver 4310

Migrations from serv1 to serv2 works, but migration from serv2 to serv1 results in VM freeze (actually debian ones but not centos).

After upgrading to kernel 6.2.16-20~bpo11+1 everything works well in both ways.

The very good point is that I could do it without stopping any VM, because if the destination kernel is 6.2 then the freeze does not appears... so I moved all my Vms from serv1 to serv2 (which was working), upgraded serv1 to kernel 6.2 and moved back all the Vms to serv1 (this time with no freeze) and upgraded serv2 to new 6.2 kernel too !
 
I had the same problem with proxmox v7.4-17 and kernel 5.15:
serv1 : Intel(R) Xeon(R) CPU E5-4620
serv2 : Intel(R) Xeon(R) Silver 4310

Migrations from serv1 to serv2 works, but migration from serv2 to serv1 results in VM freeze (actually debian ones but not centos).

After upgrading to kernel 6.2.16-20~bpo11+1 everything works well in both ways.

The very good point is that I could do it without stopping any VM, because if the destination kernel is 6.2 then the freeze does not appears... so I moved all my Vms from serv1 to serv2 (which was working), upgraded serv1 to kernel 6.2 and moved back all the Vms to serv1 (this time with no freeze) and upgraded serv2 to new 6.2 kernel too !
If I'm not wrong there was indeed an issue with the kernel 5.15 regarding hot migrations.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!