Proxmox 6.3 Live Migrate Fail - EFI Disks on local storage

jamez

New Member
Nov 27, 2019
2
0
1
32
Recently upgraded from 6.2 (no problems) to 6.3 using the enterprise repo and am seeing some live migrates fail without reason following it trying to copy the EFI disk across.

Some points:
-It seems to work fine for a new windows VM deployed from the same win 2019 template the problematic VM's were deployed from.
-No issues with unix VM's with EFI disks.
-Offline migrates work fine.
-No issues on proxmox 6.2, only seen issues since upgrading to 6.3
-Local storage with LVM on top
-2 node cluster with local storage on each node.
-If it migrated the normal disk first it will get to 100% and then fail after attempting the EFI disk.

Output from VM that worked once out of many attempts, this being one of the failed attempts:

```
2021-04-22 15:15:29 starting migration of VM 105 to node 'redacted' (redacted)
2021-04-22 15:15:30 found local disk 'raid1:105/vm-105-disk-0.qcow2' (in current VM config)
2021-04-22 15:15:30 found local disk 'raid1:105/vm-105-disk-1.qcow2' (in current VM config)
2021-04-22 15:15:30 copying local disk images
2021-04-22 15:15:30 starting VM 105 on remote node 'vmh01'
2021-04-22 15:15:32 start remote tunnel
2021-04-22 15:15:33 ssh tunnel ver 1
2021-04-22 15:15:33 starting storage migration
2021-04-22 15:15:33 efidisk0: start migration to nbd:unix:/run/qemu-server/105_nbd.migrate:exportname=drive-efidisk0
drive mirror is starting for drive-efidisk0
drive-efidisk0: Cancelling block job
drive-efidisk0: Done.
2021-04-22 15:15:33 ERROR: online migrate failure - mirroring error: drive-efidisk0: mirroring has been cancelled
2021-04-22 15:15:33 aborting phase 2 - cleanup resources
2021-04-22 15:15:33 migrate_cancel
2021-04-22 15:15:36 ERROR: migration finished with problems (duration 00:00:07)
TASK ERROR: migration problems
```

Config file for VM:

```
agent: 1
balloon: 0
bios: ovmf
bootdisk: scsi0
cores: 2
efidisk0: raid1:105/vm-105-disk-0.qcow2,format=qcow2,size=128K
ide2: none,media=cdrom
memory: 6144
name: redacted
net0: virtio=redacted,bridge=vmbr00,tag=redacted
net1: virtio=redacted,bridge=vmbr00,tag=redacted
numa: 0
onboot: 1
ostype: win10
scsi0: raid1:105/vm-105-disk-1.qcow2,cache=writeback,format=qcow2,size=80G
scsihw: virtio-scsi-pci
smbios1: uuid=redacted
sockets: 1
vmgenid: redacted
```
Is there somewhere I can get greater output as to why it's failing? I have tried all the usual log files.

I am not sure if this is related to this - https://bugzilla.proxmox.com/show_bug.cgi?id=3227 - as I am seeing VM's with EFI disks of varying size both succeed and fail.
 
just tested this and it worked fine for multiple migrations (as in your case it also seems to fail only on one vm)

i doubt the bug reports connected.

did you check the qcow2 file for integrity (only do this when the machine is not running):
Code:
qemu-img info <path_to_efi.qcow2>
qemu-img check <path_to_efi.qcow2>
 
It's not just a single VM, but all the existing windows VM's on the cluster that are seeing this issue. New VM's don't seem to have the issue.

I tried to live migrate the image to local LVM and it failed without any indication as to why:

```
create full clone of drive efidisk0 (raid1:104/vm-104-disk-0.raw)
Rounding up size to full physical extent 4.00 MiB
Logical volume "vm-104-disk-0" created.
drive mirror is starting for drive-efidisk0
drive-efidisk0: Cancelling block job
drive-efidisk0: Done.
Logical volume "vm-104-disk-0" successfully removed
TASK ERROR: storage migration failed: mirroring error: drive-efidisk0: mirroring has been cancelled
```

I then tried to local disc and it succeeded:

```
Create full clone of drive efidisk0 (raid1:104/vm-104-disk-0.raw)
Formatting '/var/lib/vz/images/104/vm-104-disk-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=131072 lazy_refcounts=off refcount_bits=16
drive mirror is starting for drive-efidisk0
drive-efidisk0: transferred: 131072 bytes remaining: 0 bytes total: 131072 bytes progression: 100.00 % busy: 1 ready: 0
drive-efidisk0: transferred: 131072 bytes remaining: 0 bytes total: 131072 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
drive-efidisk0: Completing block job...
drive-efidisk0: Completed successfully.
drive-efidisk0 : finished
TASK OK
```

I then was able to migrate it fine:

```
2021-04-23 15:25:02 starting migration of VM 104 to node 'vmh01' (10.233.220.101)
2021-04-23 15:25:02 found local disk 'local:104/vm-104-disk-0.qcow2' (in current VM config)
2021-04-23 15:25:02 found local disk 'raid1:104/vm-104-disk-1.qcow2' (in current VM config)
2021-04-23 15:25:02 copying local disk images
2021-04-23 15:25:02 starting VM 104 on remote node 'vmh01'
2021-04-23 15:25:04 start remote tunnel
2021-04-23 15:25:05 ssh tunnel ver 1
2021-04-23 15:25:05 starting storage migration
2021-04-23 15:25:05 scsi0: start migration to nbd:unix:/run/qemu-server/104_nbd.migrate:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 0 bytes remaining: 85899345920 bytes total: 85899345920 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi0: transferred: 1018167296 bytes remaining: 84881178624 bytes total: 85899345920 bytes progression: 1.19 % busy: 1 ready: 0
...
drive-scsi0: transferred: 85903474688 bytes remaining: 0 bytes total: 85903474688 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2021-04-23 15:26:49 volume 'raid1:104/vm-104-disk-1.qcow2' is 'raid1:104/vm-104-disk-0.qcow2' on the target
2021-04-23 15:26:49 efidisk0: start migration to nbd:unix:/run/qemu-server/104_nbd.migrate:exportname=drive-efidisk0
drive mirror is starting for drive-efidisk0
drive-efidisk0: transferred: 131072 bytes remaining: 0 bytes total: 131072 bytes progression: 100.00 % busy: 1 ready: 0
drive-scsi0: transferred: 85903474688 bytes remaining: 0 bytes total: 85903474688 bytes progression: 100.00 % busy: 0 ready: 1
drive-efidisk0: transferred: 131072 bytes remaining: 0 bytes total: 131072 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 85904064512 bytes remaining: 0 bytes total: 85904064512 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2021-04-23 15:26:50 volume 'local:104/vm-104-disk-0.qcow2' is 'local:104/vm-104-disk-0.qcow2' on the target
2021-04-23 15:26:50 starting online/live migration on unix:/run/qemu-server/104.migrate
2021-04-23 15:26:50 set migration_caps
2021-04-23 15:26:50 migration speed limit: 8589934592 B/s
2021-04-23 15:26:50 migration downtime limit: 100 ms
2021-04-23 15:26:50 migration cachesize: 1073741824 B
2021-04-23 15:26:50 set migration parameters
2021-04-23 15:26:50 start migrate command to unix:/run/qemu-server/104.migrate
2021-04-23 15:26:51 migration status: active (transferred 316352320, remaining 6127951872), total 6462070784)
...
2021-04-23 15:27:01 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 20416 overflow 0
2021-04-23 15:27:01 migration speed: 52.97 MB/s - downtime 70 ms
2021-04-23 15:27:01 migration status: completed
drive-efidisk0: transferred: 131072 bytes remaining: 0 bytes total: 131072 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 85904982016 bytes remaining: 0 bytes total: 85904982016 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
drive-efidisk0: Completing block job...
drive-efidisk0: Completed successfully.
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-efidisk0 : finished
drive-scsi0 : finished
2021-04-23 15:27:02 stopping NBD storage migration server on target.
2021-04-23 15:27:08 migration finished successfully (duration 00:02:06)
TASK OK
```

I then moved the disc back to the original storage share and it worked fine again.

I did an offline disc check on the remaining VM's with issues and they came back fine, but after carrying out the above process they now migrate fine.

I don't know what was broken but it would have been good if I could have got some more output from proxmox.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!