Live migration problems between higher to lower frequencies CPUs

Matteo Calorio

Well-Known Member
Jun 30, 2017
34
0
46
52
Good evening,

we experience problems when migrating to hosts between a host with CPU:

Code:
64 x Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz (2 Sockets)

to a host with CPU:

Code:
40 x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (2 Sockets)

No problem in the opposite direction.

On both host we have the same PVE version:

Code:
pve-manager/7.3-3/c3928077 (running kernel: 5.15.74-1-pve) (Community subscription)

Following other posts, we tried with other kernels (on older server, the one with Xeon 6326) without success:

Code:
pve-kernel-5.15.39-3-pve-guest-fpu_5.15.39-3_amd64.deb
pve-kernel-5.13.19-6-pve
pve-kernel-5.19.17-1-pve

An error we noticed is:

Code:
QEMU[2724]: kvm: warning: TSC frequency mismatch between VM (2900006 kHz) and host (2299997 kHz), and TSC scaling unavailable

With Linux Debian VMs, migration job completes correctly, but the VM has CPU at 100% after that and is unavailbale.

With Windows VMS, migration job completes correctly, but Widows detects some problems and restart the guest OS.

On older host cat /proc/cpuinfo shows tsc_scaling paramether is not present, but on newer one it is.

Any ideas?

Best regards,
Matteo
 
Hi,

we didn't test it because we prefer not to stop the new node with all VMs onto it to update it.

Now the plan is to see if we can use other nodes to complete cluster upgrade without any restart of VMs, updating them to kernel 5.19.

It will take a couple of weeks, but I'll keep this post updated.

Thanks and have a good weekend,
Matteo
 
Hi,

I can finally confirm that from the tests done so far, with all nodes at kernel v.5.19.17-1 and PVE 7.3.3, the VM migration works perfectly!

Thanks and have a good day,
Matteo
 
... until few weeks ago:

VMs with CPU at 100%

While kernel 5.19 seems it fixed the original problem, it also seems it introduced another one that we didn't have in previous versions of PVE on the same hardware.

Do you know if by chance what solved the problem of live migration in version 5.19 of the kernel has been ported to version 5.15 and if therefore we can eventually go back to the "official" version or if we should stay at v.5.19 or if there is one that is recommended, considering our it's a production system?

Nodes are servers with CPUs:
40 x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (2 Sockets)
48 x Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz (2 Sockets)
40 x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2 Sockets)
48 x AMD EPYC 7272 12-Core Processor (2 Sockets)
64 x Intel(R) Xeon(R) Gold 6326 CPUs @ 2.90GHz (2 Sockets)
 
Unfortunately, I don't think this specific fix made it back into 5.15 (IIRC its an FPU-related issue where the fix was not possible to backport cleanly, but not fully sure). You could try the current 6.2 opt-in kernel. The 5.19 kernel won't receive updates from us anymore.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!