Online migration / disk move problem

Idaho · Nov 22, 2021

I'm trying to migrate VM storage to Linstor SDS and have some odd troubles. All nodes are running PVE 7.1:

pve-manager/7.1-5/6fe299a0 (running kernel: 5.13.19-1-pve)

Linstor storage is, for now, on one host. When I create new VM on linstor it works. When I try to migrate VM from another host (and another storage) to Linstor it fails:

2021-11-22 13:06:53 starting migration of VM 116 to node 'proxmox-ve3' (192.168.8.203)
2021-11-22 13:06:53 found local disk 'local-lvm:vm-116-disk-0' (in current VM config)
2021-11-22 13:06:53 starting VM 116 on remote node 'proxmox-ve3'
2021-11-22 13:07:01 volume 'local-lvm:vm-116-disk-0' is 'linstor-local:vm-116-disk-1' on the target
2021-11-22 13:07:01 start remote tunnel
2021-11-22 13:07:03 ssh tunnel ver 1
2021-11-22 13:07:03 starting storage migration
2021-11-22 13:07:03 scsi1: start migration to nbd:unix:/run/qemu-server/116_nbd.migrate:exportname=drive-scsi1
drive mirror is starting for drive-scsi1 with bandwidth limit: 51200 KB/s
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2021-11-22 13:07:03 ERROR: online migrate failure - block job (mirror) error: drive-scsi1: 'mirror' has been cancelled
2021-11-22 13:07:03 aborting phase 2 - cleanup resources
2021-11-22 13:07:03 migrate_cancel
2021-11-22 13:07:08 ERROR: migration finished with problems (duration 00:00:16)
TASK ERROR: migration problems

Linstor volumes are created during migration, no errors in it's logs. I don't know why Proxmox is cancelling this job.

When I try to move disk from NFS to Linstor (online) it fails:

create full clone of drive scsi0 (nfs-backup:129/vm-129-disk-0.qcow2)

NOTICE
Trying to create diskful resource (vm-129-disk-1) on (proxmox-ve3).
drive mirror is starting for drive-scsi0 with bandwidth limit: 51200 KB/s
drive-scsi0: Cancelling block job
drive-scsi0: Done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-scsi0: 'mirror' has been cancelled

To move storage to Linstor I have first move it to NFS (online), turn off VM and move VM storage offline to Linstor. And bizzare thing is that once I do it, I can move this particular VM storage from Linstor to NFS online and from NFS to Linstor online. I can also migrate VM online, from Linstor, directly to another node and another storage without problems.

I've setup test cluster to reproduce this problem and couldn't - online migration to Linstor storage just worked. I don't know why it's not working on main cluster - any hints how to debug it?

storage.cfg:

dir: local
path /var/lib/vz
content vztmpl,iso,backup

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images
nodes proxmox-ve5,proxmox-ve4

drbd: linstor-local
content images,rootdir
controller 192.168.8.203
resourcegroup linstor-local
preferlocal yes
nodes proxmox-ve3

zfspool: local-zfs
pool rpool/data
content images,rootdir
nodes proxmox-ve0,proxmox-ve1,proxmox-ve2
sparse 1

nfs: nfs-backup
export /data/nfs
path /mnt/pve/nfs-backup
server backup2
content rootdir,backup,images,iso,vztmpl
options vers=3

fiona · Nov 22, 2021

Hi,
sounds like it could have the same root cause as the issue reported here. (Ping @fabian )

Idaho · Nov 22, 2021

I know https://bugzilla.proxmox.com/show_bug.cgi?id=3227 bug, it hit me before. https://bugzilla.proxmox.com/show_bug.cgi?id=3227 is quite simple to work around, because it works with offline VMs and really small disks, so downtime is short.

My problem is also with larger disks, 50 GB to 1000 GB. Downtime on those is too long to migrate them offline.

Maybe it's related, it would be great to fix two problems at once

Idaho · Nov 23, 2021

Maybe it's indeed something with VM disk size. More info:

I have two identical (in terms of size) VMs - both have 32 GB disk. The one that was not migrated to Linstor looks like this:

scsi0: local-lvm:vm-131-disk-0,cache=writeback,size=32G

The one that was migrated (offline, via NFS) to Linstor, which originally has size=32G, on Linstor looks like this:

scsi0: linstor-local:vm-125-disk-1,cache=writeback,size=33555416K

33555416 KiB is 32.0009384 GiB, slightly larger than 32 GiB.

dmesg of failed migration of VM between nodes, from thin LVM to Linstor, dmesg is from target node: https://pastebin.com/rN5ZQ8vN
This VM has two disks:

scsi0: local-lvm:vm-132-disk-0,cache=writeback,size=16M
scsi1: local-lvm:vm-132-disk-1,cache=writeback,size=10244M

This VM was at some point at Linstor, because it's size is 10244M instead of 10G (originally it was 10G).

danadm · Nov 28, 2021

I have also same issue, qcow2 files on NFS share were created under proxmox 5.1, now I would like to migrate them to Proxmox 7.1, I attached the NFS, started the VM's, it works fine, but live disk migration to DRBD volume isn't possible, not error show, but cancelled. New created DRBD volumes can be migrated to NFS share and back without any problem.

selektor · Dec 23, 2021

Gentlemen, does linstor drbd work in version 7.1 proxmox? Somewhere I read that linstor supports proxmox max 7.0

danadm · Jan 8, 2022

selektor said:
Gentlemen, does linstor drbd work in version 7.1 proxmox? Somewhere I read that linstor supports proxmox max 7.0

7.1-8 works fine, I upgraded today two clusters to this version. Just the migration from NFS/QCOW2 to DRBD still fails.

albert_a · Jan 18, 2022

This really is a shame. I can't even migrate to DRBD offline because of the inconsistency in size after migration:

On ZFS:
root@dc2:~# blockdev --getsize64 /dev/sda
2151677952

After moving to DRBD:
root@dc2:~# blockdev --getsize64 /dev/sda
2155372544 (+3694592 bytes = 3608K)

Back to ZFS:
root@dc2:~# blockdev --getsize64 /dev/sda
2155872256 (+499712 bytes = 488K, I guess it is just rounded to 4M оr)
Total roundtrip size diff: 4194304 bytes = 4M

It's disapointing, because for example GPT backup is aligned from the end of device, and it seems to be unusable if primary GPT fails. It's just an example. But think, if you should use it in production.....

fabian · Jan 18, 2022

also see https://lists.proxmox.com/pipermail/pve-devel/2021-November/051095.html

albert_a · Jan 18, 2022

Thanks for the link, Fabian,
Currently we can live with offline migration, but what really has blown my mind is that the migration process altering data! I mean disk size.

If I understand right, it has been made deliberately for debugging process, isn't it?
So it is not related to DRBD and I should not worry about it, do I?

In case it is PVE-related, when do you think it will be possible to move disk without altering it??

Thank you

fabian · Jan 19, 2022

no, it's basically because different storages have different ways of accounting overhead and/or rounding up sizes to their own granularity. it's not intentional on our side, but hard to fix (properly). we ask the source storage how big the disk is, and then tell the target storage to allocate a disk with that size, but the target storage gives us a bigger one, and Qemu doesn't like that for obvious reasons

albert_a · Jan 24, 2022

Thank you for the clarification, Fabian.
Unfortunately we can't use DRBD until the size issue is fixed. Hope it will be fixed soon.
Best regards, Albert

anush.intech · Jan 21, 2024

Hi,
Was this issue ever fixed?

fiona · Jan 22, 2024

Hi,

anush.intech said:
Was this issue ever fixed?

no, the relevant bug report is still open: https://bugzilla.proxmox.com/show_bug.cgi?id=3227
QEMU won't let you mirror a drive if the size doesn't match exactly since version 5.1 (as it would not be an exact replica).

hellfire · Aug 5, 2024

Hi,

I just played around with LinStor/DRBD/PVE and the size issue has a workaround now. If one wants to move a disk online to/from linstor-storage, one has to set exactsize yes Option temporarily in /etc/pve/storage.cfg. This needs the latest linstor-proxmox Plugin. This concerns at least lvm-storage and zfs-storage. I wrote temporarily because this causes trouble with other actions like resizing. So only set this option for the migration process from $someotherstorage to LinStor/drbd and remove it afterwards.

See here for details:

LinBit Forum Thread with additional information:
https://forums.linbit.com/t/trying-...rce-and-target-image-have-different-sizes/213

LinBit Documentation:
https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-linstor-proxmox-migrating-storage

Regards,
h.

Search

Search

Online migration / disk move problem

Idaho

Renowned Member

fiona

Proxmox Staff Member

Idaho

Renowned Member

Idaho

Renowned Member

danadm

Member

selektor

Member

danadm

Member

albert_a

Well-Known Member

fabian

Proxmox Staff Member

albert_a

Well-Known Member

fabian

Proxmox Staff Member

albert_a

Well-Known Member

anush.intech

Member

fiona

Proxmox Staff Member

hellfire

Renowned Member