[SOLVED] live migration between prox v8 and v9 nodes fails when vm has efidisk

K-P4ul

Member
Jan 28, 2021
13
1
23
Hi,

i have a 2+1 node (2 nodes + 1 qdevice) cluster with linstor drbd as shared storage. One node is running on version 8.4.14, the other one on version 9.0.15 (freshly updated).
When i now live migrate all vms to the newer node (to upgrade the node with version 8) the migration fails for vms with an efi disk.

The error messages i get are:
Job: VM 104 - Migrate
Code:
task started by HA resource agent
2025-11-17 16:29:25 use dedicated network address for sending migration traffic (10.255.240.59)
2025-11-17 16:29:25 starting migration of VM 104 to node 'regis' (10.255.240.59)
2025-11-17 16:29:26 starting VM 104 on remote node 'regis'
2025-11-17 16:29:27 [regis] Plugin "PVE::Storage::Custom::LINSTORPlugin" is implementing an older storage API, an upgrade is recommended
2025-11-17 16:29:30 [regis] close (rename) atomic file '/etc/pve/nodes/regis/qemu-server/104.conf' failed: File exists
2025-11-17 16:29:30 ERROR: online migrate failure - remote command failed with exit code 255
2025-11-17 16:29:30 aborting phase 2 - cleanup resources
2025-11-17 16:29:30 migrate_cancel
2025-11-17 16:29:32 ERROR: migration finished with problems (duration 00:00:07)
TASK ERROR: migration problems

Job: VM 104 - Start
Code:
efidisk0: enrolling Microsoft UEFI CA 2023
INFO: reading raw edk2 varstore from /var/run/qemu-server/qsd-104-efidisk0-enroll.fuse
INFO: var store range: 0x64 -> 0x40000
INFO: add db cert /usr/lib/python3/dist-packages/virt/firmware/certs/MicrosoftCorporationUEFICA2011.pem
INFO: certificate already present, skipping
INFO: add db cert /usr/lib/python3/dist-packages/virt/firmware/certs/MicrosoftUEFICA2023.pem
INFO: certificate already present, skipping
INFO: writing raw edk2 varstore to /var/run/qemu-server/qsd-104-efidisk0-enroll.fuse

TASK ERROR: close (rename) atomic file '/etc/pve/nodes/regis/qemu-server/104.conf' failed: File exists

When I remove the efidisk from the vm live migration works flawless. The Problem is that I have clusters with more than 50 vms running. So shutting all the vms down to remove the efidisk is not an option.
 
Hi,

nearly same error here - PVE 9 to 9 - no more migration possible:

task started by HA resource agent
2025-11-18 09:32:01 conntrack state migration not supported or disabled, active connections might get dropped
2025-11-18 09:32:02 use dedicated network address for sending migration traffic
2025-11-18 09:32:02 starting migration of VM 103 to node 'pveAMD02'
2025-11-18 09:32:02 starting VM 103 on remote node 'pveAMD02'
2025-11-18 09:32:02 [pveAMD02] Plugin "PVE::Storage::Custom::LINSTORPlugin" is implementing an older storage API, an upgrade is recommended
2025-11-18 09:32:02 [pveAMD02] stat for '/dev/drbd/by-res/pm-1be83a14/0' failed - No such file or directory
2025-11-18 09:32:02 ERROR: online migrate failure - remote command failed with exit code 255
2025-11-18 09:32:02 aborting phase 2 - cleanup resources
2025-11-18 09:32:02 migrate_cancel
2025-11-18 09:32:03 ERROR: migration finished with problems (duration 00:00:02)
TASK ERROR: migration problems

I've just installed the latest updates...
 
I can confirm, pve 9.0.15 on both nodes.
TASK ERROR: efidisk0: enrolling Microsoft UEFI CA 2023 failed - command 'virt-fw-vars --inplace /var/run/qemu-server/qsd-7607-efidisk0-enroll.fuse --distro-keys ms-uefi' failed: exit code 1
 
Sorry - but still having the same issue like formerly posted.

I've checked - qemu-server is 9.0.29.

Still can't migrate live VM - offline works fine.
 
did you double check that the *target* node has the new version?

could you post the complete VM config?
 
installed on both nodes:
qemu-server/stable,now 9.0.29 amd64 [installed]

Restart for new kernel is still pending due to migration issue

VM config:
agent: 1
bios: ovmf
boot: order=scsi0
cores: 2
cpu: host
efidisk0: linstor_storage: pm-1be83a14_103,efitype=4m,size=528K
localtime: 1
memory: 4096
meta: creation-qemu=9.0.2,ctime=1734190017
name: haos14.0
net1: virtio=BC:24:11:85:36:29,bridge=vmbr0,firewall=1,tag=10
onboot: 1
ostype: l26
scsi0: linstor_storage: pm-4667fed5_103,cache=writethrough,discard=on,size=33555416K,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=f95a1329-8ef1-4aee-a8f7-49687eab0fd7
tablet: 0
 
Last edited:
okay, this is strange.. does it also happen for other storages than linstore?
 
I'm using only DRBD as storage for the cluster.

I've just got feedback from the linstor guys:

The DRBD kmod 9.2.15 does not build for the 6.17 kernel.

You may wish to load the latest compatible kernel, which should be those in the 6.14 series.

Otherwise, the next DRBD release, 9.2.16, will be compatible with 6.17 and is expected to be released in one week, assuming everything goes as expected with testing of the release candidates.


So I highly recommend to postpone the current debian kernel 6.17 update if you are using drbd.
 
ah, so the volume was not activated but that didn't already throw an error? it sounds like that could maybe be improved in the linstor plugin then ;)