I did do a shut/start of the VMs after reading the first part of this thread, but still ran into the same issue after.shutdown + start VM required to use new version.
This sounds like you got some issues with the source or target storage, it could still be a regression from QEMU 9, but IMO that's a bit less likely.
Can you check the kernel/system logs for possibly relate log messages happening around the time the live-migration fails. Please also post the VM config (qm config VMID
) and the type of the underlying source and target storage.
qm config 105
boot: order=sata0;ide2;net0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 4096
name: fr-dev-qa-gen5-balanced
net0: virtio=0A:1E:76:93:E3:58,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
sata0: vm-storage-01:vm-105-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=42f30e80-d619-4133-b1c9-e40a6dcdfddc
sockets: 1
vmgenid: 0f8eb6a8-4904-4047-8200-b4ab077816a1
Nov 05 09:47:38 pve50 QEMU[298688]: kvm: Failed to put registers after init: Invalid argument
Nov 05 09:47:38 pve50 kernel: tap105i0: left allmulticast mode
Nov 05 09:47:38 pve50 kernel: fwbr105i0: port 2(tap105i0) entered disabled state
Nov 05 09:47:38 pve50 kernel: fwbr105i0: port 1(fwln105i0) entered disabled state
Nov 05 09:47:38 pve50 kernel: vmbr1: port 3(fwpr105p0) entered disabled state
Nov 05 09:47:38 pve50 kernel: fwln105i0 (unregistering): left allmulticast mode
Nov 05 09:47:38 pve50 kernel: fwln105i0 (unregistering): left promiscuous mode
Nov 05 09:47:38 pve50 kernel: fwbr105i0: port 1(fwln105i0) entered disabled state
Nov 05 09:47:38 pve50 kernel: fwpr105p0 (unregistering): left allmulticast mode
Nov 05 09:47:38 pve50 kernel: fwpr105p0 (unregistering): left promiscuous mode
Nov 05 09:47:38 pve50 kernel: vmbr1: port 3(fwpr105p0) entered disabled state
Nov 05 09:47:38 pve50 kernel: zd96: p1 p2 p3
Nov 05 09:47:38 pve50 lvm[299538]: /dev/zd96p3 excluded: device is rejected by filter config.
Nov 05 09:47:39 pve50 systemd[1]: 105.scope: Deactivated successfully.
Nov 05 09:47:39 pve50 systemd[1]: 105.scope: Consumed 1min 37.896s CPU time.
Nov 05 09:47:39 pve50 sshd[299547]: Accepted publickey for root from 10.10.20.40 port 33550 ssh2: RSA ....
Nov 05 09:47:39 pve50 sshd[299547]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 05 09:47:39 pve50 systemd-logind[2170]: New session 74 of user root.
Nov 05 09:47:39 pve50 systemd[1]: Started session-74.scope - Session 74 of User root.
Nov 05 09:47:39 pve50 sshd[299547]: pam_env(sshd:session): deprecated reading of user environment enabled
Nov 05 09:47:40 pve50 pvestatd[2741]: no such logical volume pve/data
Nov 05 09:47:40 pve50 qm[299553]: <root@pam> starting task UPID:pve50:0004923B:00487BF4:672A5A3C:qmstop:105:root@pam:
Nov 05 09:47:40 pve50 qm[299579]: stop VM 105: UPID:pve50:0004923B:00487BF4:672A5A3C:qmstop:105:root@pam:
Nov 05 09:47:40 pve50 qm[299553]: <root@pam> end task UPID:pve50:0004923B:00487BF4:672A5A3C:qmstop:105:root@pam: OK
Nov 05 09:47:40 pve50 sshd[299547]: Received disconnect from 10.10.20.40 port 33550:11: disconnected by user
Nov 05 09:47:40 pve50 sshd[299547]: Disconnected from user root 10.10.20.40 port 33550
Nov 05 09:47:40 pve50 sshd[299547]: pam_unix(sshd:session): session closed for user root
Hi @fiona@twhidden is the failure always for a SATA type drive or others too? Does it help if you downgrade withapt install pve-qemu-kvm=8.2.2-1
or further withapt install pve-qemu-kvm=8.1.5-6
?
Nov 05 09:59:38 pve51 QEMU[262939]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 09:59:38 pve51 QEMU[262939]: kvm: Failed to put registers after init: Invalid argument
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265490 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265491 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265492 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265493 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265494 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265495 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265496 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
I'd guess this is the actual cause of the issue and the failing drive-mirror just being a later consequence.Here is what happens in the system logs at the time of that error - most notably, the "kvm: Failed to put registers after init: Invalid argument" which was in red.
not following on "failing drive-mirror"... but here is the info on the hosts:I'd guess this is the actual cause of the issue and the failing drive-mirror just being a later consequence.
What is the CPU model of the hosts, i.e. migration source node and migration target node? What kernels are they running?
CPU(s) 40 x Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz (2 Sockets)
Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Boot Mode EFI
Manager Version pve-manager/8.2.7
CPU(s) 56 x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (2 Sockets)
Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Boot Mode Legacy BIOS
Manager Version pve-manager/8.2.7
That was the original error message you posted.not following on "failing drive-mirror"... but here is the info on the hosts:
Code:cpu: host
You should not usepve1 (source)
Code:CPU(s) 40 x Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz (2 Sockets) Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z) Boot Mode EFI Manager Version pve-manager/8.2.7
pve50 or pve51 (destination)
Code:CPU(s) 56 x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (2 Sockets) Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z) Boot Mode Legacy BIOS Manager Version pve-manager/8.2.7
host
CPU type when you don't have the same CPU model on source and target. Live migration cannot be guaranteed to work then, see: https://pve.proxmox.com/pve-docs/chapter-qm.html#_cpu_typeGotcha -- but that isn't the reason for the migration failing, is it? The "Invalid Argument"That was the original error message you posted.
You should not usehost
CPU type when you don't have the same CPU model on source and target. Live migration cannot be guaranteed to work then, see: https://pve.proxmox.com/pve-docs/chapter-qm.html#_cpu_type
It most likely is,Gotcha -- but that isn't the reason for the migration failing, is it? The "Invalid Argument"
Failed to put registers
refers to CPU registers: https://gitlab.com/qemu-project/qemu/-/blob/master/accel/kvm/kvm-all.c?ref_type=heads#L2902I don't have exactly those CPU models and there is no need to reproduce, see my previous reply.Side note, I had to use Host for some dev software we were testing, as they used certain CPU flags required that were not available under the default. I believe it was related to AVX2. But that is good to know. New hardware is identical so should be better.
Thanks for the hint downgrading to 8.1.5-6 - that got back to working. Hope you can reproduce the 9.x issue with what we learned here.
Sorry if it is off-topic, but @twhidden, for the AVX flags, you can use x86-64-v3, which supports AVX2 and corresponds to an Intel Haswell (2013, > Xeon v3) or an AMD Excavator (2015).Side note, I had to use Host for some dev software we were testing, as they used certain CPU flags required that were not available under the default. I believe it was related to AVX2. But that is good to know. New hardware is identical so should be better.