I did do a shut/start of the VMs after reading the first part of this thread, but still ran into the same issue after.shutdown + start VM required to use new version.
This sounds like you got some issues with the source or target storage, it could still be a regression from QEMU 9, but IMO that's a bit less likely.
Can you check the kernel/system logs for possibly relate log messages happening around the time the live-migration fails. Please also post the VM config (qm config VMID
) and the type of the underlying source and target storage.
qm config 105
boot: order=sata0;ide2;net0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 4096
name: fr-dev-qa-gen5-balanced
net0: virtio=0A:1E:76:93:E3:58,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
sata0: vm-storage-01:vm-105-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=42f30e80-d619-4133-b1c9-e40a6dcdfddc
sockets: 1
vmgenid: 0f8eb6a8-4904-4047-8200-b4ab077816a1
Nov 05 09:47:38 pve50 QEMU[298688]: kvm: Failed to put registers after init: Invalid argument
Nov 05 09:47:38 pve50 kernel: tap105i0: left allmulticast mode
Nov 05 09:47:38 pve50 kernel: fwbr105i0: port 2(tap105i0) entered disabled state
Nov 05 09:47:38 pve50 kernel: fwbr105i0: port 1(fwln105i0) entered disabled state
Nov 05 09:47:38 pve50 kernel: vmbr1: port 3(fwpr105p0) entered disabled state
Nov 05 09:47:38 pve50 kernel: fwln105i0 (unregistering): left allmulticast mode
Nov 05 09:47:38 pve50 kernel: fwln105i0 (unregistering): left promiscuous mode
Nov 05 09:47:38 pve50 kernel: fwbr105i0: port 1(fwln105i0) entered disabled state
Nov 05 09:47:38 pve50 kernel: fwpr105p0 (unregistering): left allmulticast mode
Nov 05 09:47:38 pve50 kernel: fwpr105p0 (unregistering): left promiscuous mode
Nov 05 09:47:38 pve50 kernel: vmbr1: port 3(fwpr105p0) entered disabled state
Nov 05 09:47:38 pve50 kernel: zd96: p1 p2 p3
Nov 05 09:47:38 pve50 lvm[299538]: /dev/zd96p3 excluded: device is rejected by filter config.
Nov 05 09:47:39 pve50 systemd[1]: 105.scope: Deactivated successfully.
Nov 05 09:47:39 pve50 systemd[1]: 105.scope: Consumed 1min 37.896s CPU time.
Nov 05 09:47:39 pve50 sshd[299547]: Accepted publickey for root from 10.10.20.40 port 33550 ssh2: RSA ....
Nov 05 09:47:39 pve50 sshd[299547]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Nov 05 09:47:39 pve50 systemd-logind[2170]: New session 74 of user root.
Nov 05 09:47:39 pve50 systemd[1]: Started session-74.scope - Session 74 of User root.
Nov 05 09:47:39 pve50 sshd[299547]: pam_env(sshd:session): deprecated reading of user environment enabled
Nov 05 09:47:40 pve50 pvestatd[2741]: no such logical volume pve/data
Nov 05 09:47:40 pve50 qm[299553]: <root@pam> starting task UPID:pve50:0004923B:00487BF4:672A5A3C:qmstop:105:root@pam:
Nov 05 09:47:40 pve50 qm[299579]: stop VM 105: UPID:pve50:0004923B:00487BF4:672A5A3C:qmstop:105:root@pam:
Nov 05 09:47:40 pve50 qm[299553]: <root@pam> end task UPID:pve50:0004923B:00487BF4:672A5A3C:qmstop:105:root@pam: OK
Nov 05 09:47:40 pve50 sshd[299547]: Received disconnect from 10.10.20.40 port 33550:11: disconnected by user
Nov 05 09:47:40 pve50 sshd[299547]: Disconnected from user root 10.10.20.40 port 33550
Nov 05 09:47:40 pve50 sshd[299547]: pam_unix(sshd:session): session closed for user root
Hi @fiona@twhidden is the failure always for a SATA type drive or others too? Does it help if you downgrade withapt install pve-qemu-kvm=8.2.2-1
or further withapt install pve-qemu-kvm=8.1.5-6
?
Nov 05 09:59:38 pve51 QEMU[262939]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 09:59:38 pve51 QEMU[262939]: kvm: Failed to put registers after init: Invalid argument
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265490 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265491 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265492 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265493 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265494 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265495 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 kernel: clearing PKRU xfeature bit as vCPU from PID 265496 reports no PKRU support - migration from fpu-leaky kernel?
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
Nov 05 10:06:16 pve51 QEMU[265298]: kvm: warning: TSC frequency mismatch between VM (2194842 kHz) and host (2599997 kHz), and TSC scaling unavailable
I'd guess this is the actual cause of the issue and the failing drive-mirror just being a later consequence.Here is what happens in the system logs at the time of that error - most notably, the "kvm: Failed to put registers after init: Invalid argument" which was in red.
not following on "failing drive-mirror"... but here is the info on the hosts:I'd guess this is the actual cause of the issue and the failing drive-mirror just being a later consequence.
What is the CPU model of the hosts, i.e. migration source node and migration target node? What kernels are they running?
CPU(s) 40 x Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz (2 Sockets)
Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Boot Mode EFI
Manager Version pve-manager/8.2.7
CPU(s) 56 x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (2 Sockets)
Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z)
Boot Mode Legacy BIOS
Manager Version pve-manager/8.2.7
That was the original error message you posted.not following on "failing drive-mirror"... but here is the info on the hosts:
Code:cpu: host
You should not usepve1 (source)
Code:CPU(s) 40 x Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz (2 Sockets) Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z) Boot Mode EFI Manager Version pve-manager/8.2.7
pve50 or pve51 (destination)
Code:CPU(s) 56 x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (2 Sockets) Kernel Version Linux 6.8.12-3-pve (2024-10-23T11:41Z) Boot Mode Legacy BIOS Manager Version pve-manager/8.2.7
host
CPU type when you don't have the same CPU model on source and target. Live migration cannot be guaranteed to work then, see: https://pve.proxmox.com/pve-docs/chapter-qm.html#_cpu_typeGotcha -- but that isn't the reason for the migration failing, is it? The "Invalid Argument"That was the original error message you posted.
You should not usehost
CPU type when you don't have the same CPU model on source and target. Live migration cannot be guaranteed to work then, see: https://pve.proxmox.com/pve-docs/chapter-qm.html#_cpu_type
It most likely is,Gotcha -- but that isn't the reason for the migration failing, is it? The "Invalid Argument"
Failed to put registers
refers to CPU registers: https://gitlab.com/qemu-project/qemu/-/blob/master/accel/kvm/kvm-all.c?ref_type=heads#L2902I don't have exactly those CPU models and there is no need to reproduce, see my previous reply.Side note, I had to use Host for some dev software we were testing, as they used certain CPU flags required that were not available under the default. I believe it was related to AVX2. But that is good to know. New hardware is identical so should be better.
Thanks for the hint downgrading to 8.1.5-6 - that got back to working. Hope you can reproduce the 9.x issue with what we learned here.
Sorry if it is off-topic, but @twhidden, for the AVX flags, you can use x86-64-v3, which supports AVX2 and corresponds to an Intel Haswell (2013, > Xeon v3) or an AMD Excavator (2015).Side note, I had to use Host for some dev software we were testing, as they used certain CPU flags required that were not available under the default. I believe it was related to AVX2. But that is good to know. New hardware is identical so should be better.
I've updated to latest Proxmox 8.3 but again had to downgrade pve-qemu-kvm to 8.1.5-6, as i get lots ofI can reproduce themem address conflict
messages. They appear after changes in SeaBIOS that also caused issues for 32bit guests: https://mail.coreboot.org/hyperkitt...org/message/R7FOQMMYWVX577QNIA2AKUAGOZKNJIAP/
The questions is if that is the same root cause as the passthrough breakage or if the conflict messages are just a red herring.
A workaround is using less memory, e.g. with 2048 I do not get the messages. Could you check if that works for you too? If yes and if it also fixes passthrough then it would be a good hint that it's the same root cause.
Another workaround would be not using SeaBIOS but OVMF/UEFI.
mem address conflicts
during boot of VMs when i assign more than 2048M of memory.The SeaBIOS developers are discussing potential steps forward. See the recent messages in: https://mail.coreboot.org/hyperkitt...MSBPBDXOOAQ/#OSE5WX3S3TLQKVIVJAFEVFKQETNUPT5CI've updated to latest Proxmox 8.3 but again had to downgrade pve-qemu-kvm to 8.1.5-6, as i get lots ofmem address conflicts
during boot of VMs when i assign more than 2048M of memory.
Is there any progress regarding this issue?
I'm following this discussion. So let's see what comes around.The SeaBIOS developers are discussing potential steps forward. See the recent messages in: https://mail.coreboot.org/hyperkitt...MSBPBDXOOAQ/#OSE5WX3S3TLQKVIVJAFEVFKQETNUPT5C
for VirtIO-SCSI you can configure it via CLI/API using the@fiona or other member from Proxmox Staff - are you have any information about multiqueue for IO, implemented on latest version of QEMU but cannot usable on Proxmox? In your repo, path is available.
queues
property of the scsi<N>
option.- QEMU adds a new "Nitro-Enclave" machine type on x86 that can emulate an AWS Nitro Enclave environment and is able to boot Enclave Image Format "EIF" files.
- QEMU 9.2 adds support for enabling AVX10 and specifying the desired version of AVX10 such as AVX10-128, AVX10-256, AVX10-512, and other AVX10 version properties.
- VirtIO GPU now supports Venus encapsulation for Vulkan when using recent Virglrenderer code on the host and newer Mesa code within the guest.
- The VirtIO memory driver now supports suspend and resume on x86_64.
I think your guess about Proxmox VE 9 is good, we shall see. I'll look at QEMU 9.2 for Proxmox VE next year QEMU 9.1 is currently applied in git and going through internal testing.QEMU 9.2 is out. I'm guessing this will be a Proxmox 9 thing? Lots of new features.
Here's the summary from Phoronix: https://www.phoronix.com/news/QEMU-9.2-Released
A few of these sound really great for Proxmox in the long term.