QEMU 9.0 available as of now

@twhidden is the failure always for a SATA type drive or others too? Does it help if you downgrade with apt install pve-qemu-kvm=8.2.2-1 or further withapt install pve-qemu-kvm=8.1.5-6?
 
This sounds like you got some issues with the source or target storage, it could still be a regression from QEMU 9, but IMO that's a bit less likely.
Can you check the kernel/system logs for possibly relate log messages happening around the time the live-migration fails. Please also post the VM config (qm config VMID) and the type of the underlying source and target storage.
Code:
qm config 105
boot: order=sata0;ide2;net0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 4096
name: fr-dev-qa-gen5-balanced
net0: virtio=0A:1E:76:93:E3:58,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
sata0: vm-storage-01:vm-105-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=42f30e80-d619-4133-b1c9-e40a6dcdfddc
sockets: 1
vmgenid: 0f8eb6a8-4904-4047-8200-b4ab077816a1

Here is what happens in the system logs at the time of that error - most notably, the "kvm: Failed to put registers after init: Invalid argument" which was in red.


Code:
Nov 05 09:47:38 pve50 QEMU[298688]: kvm: Failed to put registers after init: Invalid argument
Nov 05 09:47:38 pve50 kernel: tap105i0: left allmulticast mode
Nov 05 09:47:38 pve50 kernel: fwbr105i0: port 2(tap105i0) entered disabled state
Nov 05 09:47:38 pve50 kernel: fwbr105i0: port 1(fwln105i0) entered disabled state
Nov 05 09:47:38 pve50 kernel: vmbr1: port 3(fwpr105p0) entered disabled state
Nov 05 09:47:38 pve50 kernel: fwln105i0 (unregistering): left allmulticast mode
Nov 05 09:47:38 pve50 kernel: fwln105i0 (unregistering): left promiscuous mode
Nov 05 09:47:38 pve50 kernel: fwbr105i0: port 1(fwln105i0) entered disabled state
Nov 05 09:47:38 pve50 kernel: fwpr105p0 (unregistering): left allmulticast mode
Nov 05 09:47:38 pve50 kernel: fwpr105p0 (unregistering): left promiscuous mode
Nov 05 09:47:38 pve50 kernel: vmbr1: port 3(fwpr105p0) entered disabled state
Nov 05 09:47:38 pve50 kernel:  zd96: p1 p2 p3
Nov 05 09:47:38 pve50 lvm[299538]: /dev/zd96p3 excluded: device is rejected by filter config.
Nov 05 09:47:39 pve50 systemd[1]: 105.scope: Deactivated successfully.
Nov 05 09:47:39 pve50 systemd[1]: 105.scope: Consumed 1min 37.896s CPU time.
Nov 05 09:47:39 pve50 sshd[299547]: Accepted publickey for root from 10.10.20.40 port 33550 ssh2: RSA ....
Nov 05 09:47:39 pve50 sshd[299547]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 05 09:47:39 pve50 systemd-logind[2170]: New session 74 of user root.
Nov 05 09:47:39 pve50 systemd[1]: Started session-74.scope - Session 74 of User root.
Nov 05 09:47:39 pve50 sshd[299547]: pam_env(sshd:session): deprecated reading of user environment enabled
Nov 05 09:47:40 pve50 pvestatd[2741]: no such logical volume pve/data
Nov 05 09:47:40 pve50 qm[299553]: <root@pam> starting task UPID:pve50:0004923B:00487BF4:672A5A3C:qmstop:105:root@pam:
Nov 05 09:47:40 pve50 qm[299579]: stop VM 105: UPID:pve50:0004923B:00487BF4:672A5A3C:qmstop:105:root@pam:
Nov 05 09:47:40 pve50 qm[299553]: <root@pam> end task UPID:pve50:0004923B:00487BF4:672A5A3C:qmstop:105:root@pam: OK
Nov 05 09:47:40 pve50 sshd[299547]: Received disconnect from 10.10.20.40 port 33550:11: disconnected by user
Nov 05 09:47:40 pve50 sshd[299547]: Disconnected from user root 10.10.20.40 port 33550
Nov 05 09:47:40 pve50 sshd[299547]: pam_unix(sshd:session): session closed for user root

Source Storage: LVM-Thin
Destination Storage: ZFS (thin also)

Next try, ill roll it back to 8.x as one of the comments suggests to try.
 
Last edited:
@twhidden is the failure always for a SATA type drive or others too? Does it help if you downgrade with apt install pve-qemu-kvm=8.2.2-1 or further withapt install pve-qemu-kvm=8.1.5-6?
Hi @fiona

Downgraded to 8.2.2-1 as suggested and re-tried the live migration

Failed with errors in the journalctl - (Red text - the TSC frequency didn't show on version 9)

Code:
Nov 05 09:59:38 pve51 QEMU[262939]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 09:59:38 pve51 QEMU[262939]: kvm: Failed to put registers after init: Invalid argument

Then downgraded to 8.1.5-6, and the migration worked.

journalctl did not have the "Invalid Argument" error listed (but the TSC frequency was shown a couple time).
Here is the output:
Code:
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265490 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265491 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265492 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265493 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265494 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265495 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265496 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
 
A work around I found here - is to temporarily migrate to a temp LVM-thin volume. That migrates without error. Once its migrated, move the disks to the correct new zfs storage. An extra step, but a work around.

Hope this info helped.
 
Here is what happens in the system logs at the time of that error - most notably, the "kvm: Failed to put registers after init: Invalid argument" which was in red.
I'd guess this is the actual cause of the issue and the failing drive-mirror just being a later consequence.

What is the CPU model of the hosts, i.e. migration source node and migration target node? What kernels are they running?
 
I'd guess this is the actual cause of the issue and the failing drive-mirror just being a later consequence.

What is the CPU model of the hosts, i.e. migration source node and migration target node? What kernels are they running?
not following on "failing drive-mirror"... but here is the info on the hosts:

pve1 (source)
Code:
CPU(s) 40 x Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz (2 Sockets)
Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Boot Mode EFI
Manager Version pve-manager/8.2.7

pve50 or pve51 (destination)
Code:
CPU(s) 56 x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (2 Sockets)
Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Boot Mode Legacy BIOS
Manager Version pve-manager/8.2.7
 
not following on "failing drive-mirror"... but here is the info on the hosts:
That was the original error message you posted.
Code:
cpu: host
pve1 (source)
Code:
CPU(s) 40 x Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz (2 Sockets)
Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Boot Mode EFI
Manager Version pve-manager/8.2.7

pve50 or pve51 (destination)
Code:
CPU(s) 56 x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (2 Sockets)
Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Boot Mode Legacy BIOS
Manager Version pve-manager/8.2.7
You should not use host CPU type when you don't have the same CPU model on source and target. Live migration cannot be guaranteed to work then, see: https://pve.proxmox.com/pve-docs/chapter-qm.html#_cpu_type
 
That was the original error message you posted.


You should not use host CPU type when you don't have the same CPU model on source and target. Live migration cannot be guaranteed to work then, see: https://pve.proxmox.com/pve-docs/chapter-qm.html#_cpu_type
Gotcha -- but that isn't the reason for the migration failing, is it? The "Invalid Argument"

Side note, I had to use Host for some dev software we were testing, as they used certain CPU flags required that were not available under the default. I believe it was related to AVX2. But that is good to know. New hardware is identical so should be better.

Thanks for the hint downgrading to 8.1.5-6 - that got back to working. Hope you can reproduce the 9.x issue with what we learned here.
 
Gotcha -- but that isn't the reason for the migration failing, is it? The "Invalid Argument"
It most likely is, Failed to put registers refers to CPU registers: https://gitlab.com/qemu-project/qemu/-/blob/master/accel/kvm/kvm-all.c?ref_type=heads#L2902
Side note, I had to use Host for some dev software we were testing, as they used certain CPU flags required that were not available under the default. I believe it was related to AVX2. But that is good to know. New hardware is identical so should be better.

Thanks for the hint downgrading to 8.1.5-6 - that got back to working. Hope you can reproduce the 9.x issue with what we learned here.
I don't have exactly those CPU models and there is no need to reproduce, see my previous reply.
 
Side note, I had to use Host for some dev software we were testing, as they used certain CPU flags required that were not available under the default. I believe it was related to AVX2. But that is good to know. New hardware is identical so should be better.
Sorry if it is off-topic, but @twhidden, for the AVX flags, you can use x86-64-v3, which supports AVX2 and corresponds to an Intel Haswell (2013, > Xeon v3) or an AMD Excavator (2015).

And when you upgrade to have both servers with the same CPU, e.g., Silver 4210 CPU, you can emulate as x86-64-v4, which corresponds to CPUs that support AVX512, to Intel CPUs from Skylake (2015) or AMD CPUs using Zen.
 
Last edited:
  • Like
Reactions: fiona

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!