TASK ERROR: storage migration failed: block job (mirror) error: drive-efidisk0: 'mirror' has been cancelled

AngryAdm

Member
Sep 5, 2020
145
29
18
93
Dead PVE.... I wanted to move the EFI disk, hence I clicked MOVE.... I did not ask any cancel... why do you behave like a infantile AI and cancel my job? To anoy me? Success!!

WHY?
I don't know how many times i've had to shutdown a vm to move its EFI disk... as if its in used at all... This is not acceptable!
However, moving the 1.25tb C: drive...nooooo problem..

Can you please fix this? Just remove the feature that cancels the blockjob for no reason....so it can complete!





Move disk

Disk:
efidisk0
Target Storage:


Format:


Delete source:

Move disk

Task viewer: VM 201 - Move disk

OutputStatus

Stop
create full clone of drive efidisk0 (PVE02-STORAGE2:201/vm-201-disk-2.raw)
drive mirror is starting for drive-efidisk0
drive-efidisk0: Cancelling block job
drive-efidisk0: Done.
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-efidisk0: 'mirror' has been cancelled
 
.................




Move disk

Disk:
efidisk0
Target Storage:


Format:


Delete source:

Move disk

Task viewer: VM 107 - Move disk

OutputStatus

Stop
create full clone of drive efidisk0 (PVE02-STORAGE2:107/vm-107-disk-0.raw)
drive mirror is starting for drive-efidisk0
drive-efidisk0: Cancelling block job
drive-efidisk0: Done.
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-efidisk0: 'mirror' has been cancelled
 
The charade with migration continues...

Cancel? Why? Watchers? Huh? Shoot those watchers and get on!

This time the VM in question crashed as a bonus. Impressive. /s


drive-sata0: transferred 351.7 GiB of 1.2 TiB (28.36%) in 10m 19s
drive-sata0: transferred 352.3 GiB of 1.2 TiB (28.18%) in 10m 20s
drive-sata0: transferred 354.1 GiB of 1.2 TiB (28.35%) in 10m 21s
drive-sata0: transferred 354.6 GiB of 1.2 TiB (28.85%) in 10m 22s
drive-sata0: transferred 354.9 GiB of 1.2 TiB (29.27%) in 10m 23s
drive-sata0: Cancelling block job
2021-12-29T16:00:40.405+0100 7f4a277fe700 -1 librbd::image::preRemoveRequest: 0x56550702fa10 check_image_watchers: image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
rbd rm 'vm-107-disk-0' error: rbd: error: image still has watchers
TASK ERROR: storage migration failed: block job (mirror) error: VM 107 qmp command 'query-block-jobs' failed - got timeout
 
Watchers? Huh? Shoot those watchers and get on!
I don't think shooting anything is wanted in a production environment.
I agree, it's annoying that the EFI disks (and TPM state disks) cannot be moved at runtime right now, for whatever reason, though.

But in general, Proxmox' reliable and careful handling of resources is very appreciated instead of forcefully executing whatever the User requests (or whatever he *thinks* he requests, not knowing all the background consequences).

Would you mind sharing your VM configs and some details about your storage(s)? Source and Target.
I have no such problems on my Ceph Cluster, using PVE 7.1. Except for the EFI disks of course.
 
The cluster consists of 6 online nodes.
p1-2 for VM's
p3-4 nodes for CEPH
Each CEPH node currently has 2 2TB Kingston SEDC500M enterprise SSD's. More to be added later.
It pushes out around 800mb/sec seq read and around 550 mb/sec seq write 32K
Network consists of two sets of stacked 10Gbe switches one redundant set for public and one set for cluster, eg. each PVE storage node has 4x10Gbe dedicated for CEPH.

The storage nodes are based on AM4 and Asus WS x570 PRO and have 1x quad 10Gbe and 1x SM 8port sas controller. in a 16 bay enclosure.
Sataports connect to sas/sata backplane via reverse 4xsata-> sff-8087 and the U.2 connector is connected to a single 4 bay backplane via sff 8643->8087. The last 8 bays are connected to the SM controller. They have 32 GB ram and are expected to pack 5 OSD's initially. If more OSD's, more ram will be needed.

#1 has 64TB raidz2 WD Gold rust,. and #2 has a 8TB SSD raid10000 setup of 5 3-way ZFS mirrors on consumergrade SSD's with a radianmemory systems RMS-300 SLOG device.
The vm disks to be moved is on the SSD setup

201.conf
agent: 1,fstrim_cloned_disks=1
balloon: 0
bios: ovmf
boot: order=scsi0;ide2
cores: 8
cpu: host
efidisk0: PVE02-STORAGE2:201/vm-201-disk-2.raw,size=128K
ide2: none,media=cdrom
machine: pc-q35-6.0
memory: 65536
name: XXXXXXXXX
net0: virtio=46:C3:6F:1C:A3:65,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win10
scsi0: SSD01:vm-201-disk-0,cache=writeback,discard=on,size=2000G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=1b6d31fa-f735-43a4-8451-6c70ac2578e9
sockets: 1


###storage dir points to zfspool mounted /storage2/
dir: PVE02-STORAGE2
path /storage2/vm
content images
nodes pve02,pve01
prune-backups keep-all=1
shared 0
vmgenid:


SCSI0 moved itself from storage2 to SSD01 (ceph) quite fine when asked to do so.
But SCSI1 refuses and it should not be in use at all unless at boot time.

PS: I have never seen the highlighted bold line in a config file before.

PPS: nevermind, its the agent trim settings. :D
 
Last edited:
the reason why tpm state and efi disks can't be moved online is that both are 'writable' from the guest, so we can't touch them behind the guest's back, but are not accessible for the qemu process like regular block devices - so it's not possible to do a block mirror that intercepts/redirects writes. tpmstate is handled via a second process running next to the VM, and EFI disks are exposed like a flash chip - in both cases we just use our existing 'disk' layer on the PVE side to make management easier and flexible, they are not 'disks' as far as the VM is concerned.

it could (and should) be handled better/earlier and with a clear error message, but when some limitation like that is visible in PVE it's usually not because the devs are lazy/don't care, but because there is a good reason for that limitation.
 
the reason why tpm state and efi disks can't be moved online is that both are 'writable' from the guest, so we can't touch them behind the guest's back, but are not accessible for the qemu process like regular block devices - so it's not possible to do a block mirror that intercepts/redirects writes. tpmstate is handled via a second process running next to the VM, and EFI disks are exposed like a flash chip - in both cases we just use our existing 'disk' layer on the PVE side to make management easier and flexible, they are not 'disks' as far as the VM is concerned.

it could (and should) be handled better/earlier and with a clear error message, but when some limitation like that is visible in PVE it's usually not because the devs are lazy/don't care, but because there is a good reason for that limitation.
@fabian something we do in other circumstances is preserve a copy of such crucial info on occasion, eg during reboot etc.
EFI config info doesn't change often... so while it is writable, it is rarely written, correct?

This is truly a killer issue. :(
 
yes, that is correct. it is writable albeit rarely written - but there is no way to ensure that no writes happen during the migration (which, depending on circumstances can take a while!), so it's not safe to move the disk (which could mean losing writes altogether, or transferring an inconsistent state).
 
yes, that is correct. it is writable albeit rarely written - but there is no way to ensure that no writes happen during the migration...
Is there no way to detect that it has been written? (If nothing else, do a binary compare before and after the copy ;) )

Since EFI storage is small (typically a few hundred MB), and writes to EFI storage are incredibly rare, why not:
1) Have the migrate fail on EFI write... or better:
2) do the EFI move last (quick since it's tiny), and restart the EFI copy on write (once or twice, then fail)

As it is, I'm wanting to move my VM's away from UEFI simply because of this risk.
 
  • Like
Reactions: meichthys
I just got burned by this. I've just migrated away from esxi, and never had to do this. IMO, bringing down a VM just to move the trivially small EFI disk is less than optimal...
 
At the very least, a more user-friendly message? Please? :)
Code:
create full clone of drive efidisk0 (SSD-secure:vm-118-disk-2)
drive mirror is starting for drive-efidisk0
drive-efidisk0: Cancelling block job
drive-efidisk0: Done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-efidisk0: 'mirror' has been cancelled
 
Code:
create full clone of drive efidisk0 (SSD-secure:vm-118-disk-2)
drive mirror is starting for drive-efidisk0
drive-efidisk0: Cancelling block job
drive-efidisk0: Done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-efidisk0: 'mirror' has been cancelled

Well, I dunno about you, but seeing the above, the first words that pop into my head are NOT "gosh, that is intuitively obvious!" How hard is it to print "EFI disk cannot be moved while VM is running!"
 
  • Like
Reactions: logiczny
I was able to move the efi disk with the move disk button from a local-lvm storage to a shared directory based storage into the qcow2 format.
But backward to move from the shared storage to the lvm into raw format i got cancelled too.
I using pve 6.3

EDIT:
The vm was in running state in the process.
 
Last edited:
  • Like
Reactions: Bent
I'm a bit confused by this error message too. I just moved a Windows Server 2022 VM from one host to another (LVM storage to LVM storage) with no problems so it seems like EFI disks can move.

But now when I try to move the EFI Disk from a local LVM volume to a local ZFS volume I get the storage migration failed: block job (mirror) error: drive-efidisk0: 'mirror' has been cancelled error.

Is this the same problem or am I running into something different?

For what it's worth I'm on Proxmox 7.4.1/7.4.3:
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve) pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
 
the issue is that EFI disks *must* have a very specific (small) size, and some storages *have to round up* since they don't support such small volumes, and moving a disk live has to keep the exact size, so moving from one storage type to another may or may not work while the VM is running.
 
Thanks @fabian, that explains it! LVM to LVM and ZFS to ZFS worked.

+1 vote for more descriptive error messages, even just a link to a wiki page containing 2 or 3 common reasons why the operation might fail.
 
  • Like
Reactions: tuxick
the issue is that EFI disks *must* have a very specific (small) size, and some storages *have to round up* since they don't support such small volumes, and moving a disk live has to keep the exact size, so moving from one storage type to another may or may not work while the VM is running.
Do you know of ANY situation where the EFI disk MUST be small? I've not ever seen that.

In such situations, what makes sense is to do the rounding up front as well as can be done.

We've long had the technology to know about such things in advance. Honestly I don't think anybody will complain about making EFI disks automagically a good size across the board. (If you don't like it, dont use UEFI boot!)
Even having a "Fix EFI Size" patch (requiring reboot) would be reasonable.

Assuming a non-removable drive, with rare exceptions, the following policy would fit most. These are defined partly by OS, partly by hardware:

ALL
* Round up, as if it were a 4k per sector drive
* Consider a 65527*4kB minimum, since it's FAT32 (65527) and 4kB is quite popular. That's really not that costly today

512B/sector drive
* 100 MiB for Windows and the vast majority of Linux/Unix (per-bootable OS and/or bootable copy)
* 200 MiB for MacOS

4k/sector drive
* 263 MiB (due to how FAT32 works)

Then, allow a specific setting. (Some Unixes want 550MiB, mostly to give room for multiple OS copies)

DETAILED TECH NOTE:
* For non-removable drives, the actual bare minimum is defined by FAT32: 65527 clusters, so @512B clusters = 32.7MiB @4kB = 262.1MiB
* AFAIK, there's no defined maximum, other than the limits of FAT32 (EFI System Partition is based on FAT32 for internal drives)
 
that's a misunderstanding. the efi disk is not the ESP :) it's the equivalent of the flash chip your motherboard has to store UEFI configuration.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!