[SOLVED] live migration between prox v8 and v9 nodes fails when vm has efidisk

K-P4ul · Monday at 17:07

Hi,

i have a 2+1 node (2 nodes + 1 qdevice) cluster with linstor drbd as shared storage. One node is running on version 8.4.14, the other one on version 9.0.15 (freshly updated).
When i now live migrate all vms to the newer node (to upgrade the node with version 8) the migration fails for vms with an efi disk.

The error messages i get are:
Job: VM 104 - Migrate

Code:

task started by HA resource agent
2025-11-17 16:29:25 use dedicated network address for sending migration traffic (10.255.240.59)
2025-11-17 16:29:25 starting migration of VM 104 to node 'regis' (10.255.240.59)
2025-11-17 16:29:26 starting VM 104 on remote node 'regis'
2025-11-17 16:29:27 [regis] Plugin "PVE::Storage::Custom::LINSTORPlugin" is implementing an older storage API, an upgrade is recommended
2025-11-17 16:29:30 [regis] close (rename) atomic file '/etc/pve/nodes/regis/qemu-server/104.conf' failed: File exists
2025-11-17 16:29:30 ERROR: online migrate failure - remote command failed with exit code 255
2025-11-17 16:29:30 aborting phase 2 - cleanup resources
2025-11-17 16:29:30 migrate_cancel
2025-11-17 16:29:32 ERROR: migration finished with problems (duration 00:00:07)
TASK ERROR: migration problems

Job: VM 104 - Start

Code:

efidisk0: enrolling Microsoft UEFI CA 2023
INFO: reading raw edk2 varstore from /var/run/qemu-server/qsd-104-efidisk0-enroll.fuse
INFO: var store range: 0x64 -> 0x40000
INFO: add db cert /usr/lib/python3/dist-packages/virt/firmware/certs/MicrosoftCorporationUEFICA2011.pem
INFO: certificate already present, skipping
INFO: add db cert /usr/lib/python3/dist-packages/virt/firmware/certs/MicrosoftUEFICA2023.pem
INFO: certificate already present, skipping
INFO: writing raw edk2 varstore to /var/run/qemu-server/qsd-104-efidisk0-enroll.fuse

TASK ERROR: close (rename) atomic file '/etc/pve/nodes/regis/qemu-server/104.conf' failed: File exists

When I remove the efidisk from the vm live migration works flawless. The Problem is that I have clusters with more than 50 vms running. So shutting all the vms down to remove the efidisk is not an option.

pme242 · Tuesday at 09:38

Hi,

nearly same error here - PVE 9 to 9 - no more migration possible:

task started by HA resource agent
2025-11-18 09:32:01 conntrack state migration not supported or disabled, active connections might get dropped
2025-11-18 09:32:02 use dedicated network address for sending migration traffic
2025-11-18 09:32:02 starting migration of VM 103 to node 'pveAMD02'
2025-11-18 09:32:02 starting VM 103 on remote node 'pveAMD02'
2025-11-18 09:32:02 [pveAMD02] Plugin "PVE::Storage::Custom::LINSTORPlugin" is implementing an older storage API, an upgrade is recommended
2025-11-18 09:32:02 [pveAMD02] stat for '/dev/drbd/by-res/pm-1be83a14/0' failed - No such file or directory
2025-11-18 09:32:02 ERROR: online migrate failure - remote command failed with exit code 255
2025-11-18 09:32:02 aborting phase 2 - cleanup resources
2025-11-18 09:32:02 migrate_cancel
2025-11-18 09:32:03 ERROR: migration finished with problems (duration 00:00:02)
TASK ERROR: migration problems

I've just installed the latest updates...

akbgrv · Tuesday at 10:16

I can confirm, pve 9.0.15 on both nodes.
TASK ERROR: efidisk0: enrolling Microsoft UEFI CA 2023 failed - command 'virt-fw-vars --inplace /var/run/qemu-server/qsd-7607-efidisk0-enroll.fuse --distro-keys ms-uefi' failed: exit code 1

fabian · Tuesday at 10:21

the enrolling issue should be fixed in qemu-server >= 9.0.28:

https://git.proxmox.com/?p=qemu-server.git;a=commitdiff;h=a3e0946f447926ffbb811d686e0bb2b78db89e96

@pme242: could you check if your issue still exists in that version?

akbgrv · Tuesday at 10:52

fabian said:
the enrolling issue should be fixed in qemu-server >= 9.0.28:

https://git.proxmox.com/?p=qemu-server.git;a=commitdiff;h=a3e0946f447926ffbb811d686e0bb2b78db89e96

@pme242: could you check if your issue still exists in that version?

Yep, after upgrade to 9.0.29 migration worked

K-P4ul · Tuesday at 11:47

Same here. Upgrade qemu-server to 9.0.29 fixes the problem.

pme242 · Tuesday at 14:51

Sorry - but still having the same issue like formerly posted.

I've checked - qemu-server is 9.0.29.

Still can't migrate live VM - offline works fine.

fabian · Tuesday at 14:56

did you double check that the *target* node has the new version?

could you post the complete VM config?

pme242 · Tuesday at 15:14

installed on both nodes:
qemu-server/stable,now 9.0.29 amd64 [installed]

Restart for new kernel is still pending due to migration issue

VM config:
agent: 1
bios: ovmf
boot: order=scsi0
cores: 2
cpu: host
efidisk0: linstor_storage: pm-1be83a14_103,efitype=4m,size=528K
localtime: 1
memory: 4096
meta: creation-qemu=9.0.2,ctime=1734190017
name: haos14.0
net1: virtio=BC:24:11:85:36:29,bridge=vmbr0,firewall=1,tag=10
onboot: 1
ostype: l26
scsi0: linstor_storage: pm-4667fed5_103,cache=writethrough,discard=on,size=33555416K,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=f95a1329-8ef1-4aee-a8f7-49687eab0fd7
tablet: 0

fabian · 2025-11-19T08:45:03+0100

okay, this is strange.. does it also happen for other storages than linstore?

pme242 · 2025-11-19T08:58:57+0100

I'm using only DRBD as storage for the cluster.

I've just got feedback from the linstor guys:

The DRBD kmod 9.2.15 does not build for the 6.17 kernel.

You may wish to load the latest compatible kernel, which should be those in the 6.14 series.

Otherwise, the next DRBD release, 9.2.16, will be compatible with 6.17 and is expected to be released in one week, assuming everything goes as expected with testing of the release candidates.

So I highly recommend to postpone the current debian kernel 6.17 update if you are using drbd.

pme242 · 2025-11-19T09:03:09+0100

Thanks for the quick response.

So I'll act chill and lay low till the update is ready

fabian · 2025-11-19T10:35:49+0100

ah, so the volume was not activated but that didn't already throw an error? it sounds like that could maybe be improved in the linstor plugin then

Search

Search

[SOLVED] live migration between prox v8 and v9 nodes fails when vm has efidisk

K-P4ul

Member

pme242

New Member

akbgrv

New Member

fabian

Proxmox Staff Member

akbgrv

New Member

K-P4ul

Member

pme242

New Member

fabian

Proxmox Staff Member

pme242

New Member

fabian

Proxmox Staff Member

pme242

New Member

pme242

New Member

fabian

Proxmox Staff Member

We value your privacy