I'm trying to migrate VM storage to Linstor SDS and have some odd troubles. All nodes are running PVE 7.1:
pve-manager/7.1-5/6fe299a0 (running kernel: 5.13.19-1-pve)
Linstor storage is, for now, on one host. When I create new VM on linstor it works. When I try to migrate VM from another host (and another storage) to Linstor it fails:
2021-11-22 13:06:53 starting migration of VM 116 to node 'proxmox-ve3' (192.168.8.203)
2021-11-22 13:06:53 found local disk 'local-lvm:vm-116-disk-0' (in current VM config)
2021-11-22 13:06:53 starting VM 116 on remote node 'proxmox-ve3'
2021-11-22 13:07:01 volume 'local-lvm:vm-116-disk-0' is 'linstor-local:vm-116-disk-1' on the target
2021-11-22 13:07:01 start remote tunnel
2021-11-22 13:07:03 ssh tunnel ver 1
2021-11-22 13:07:03 starting storage migration
2021-11-22 13:07:03 scsi1: start migration to nbd:unix:/run/qemu-server/116_nbd.migrate:exportname=drive-scsi1
drive mirror is starting for drive-scsi1 with bandwidth limit: 51200 KB/s
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2021-11-22 13:07:03 ERROR: online migrate failure - block job (mirror) error: drive-scsi1: 'mirror' has been cancelled
2021-11-22 13:07:03 aborting phase 2 - cleanup resources
2021-11-22 13:07:03 migrate_cancel
2021-11-22 13:07:08 ERROR: migration finished with problems (duration 00:00:16)
TASK ERROR: migration problems
Linstor volumes are created during migration, no errors in it's logs. I don't know why Proxmox is cancelling this job.
When I try to move disk from NFS to Linstor (online) it fails:
create full clone of drive scsi0 (nfs-backup:129/vm-129-disk-0.qcow2)
NOTICE
Trying to create diskful resource (vm-129-disk-1) on (proxmox-ve3).
drive mirror is starting for drive-scsi0 with bandwidth limit: 51200 KB/s
drive-scsi0: Cancelling block job
drive-scsi0: Done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-scsi0: 'mirror' has been cancelled
To move storage to Linstor I have first move it to NFS (online), turn off VM and move VM storage offline to Linstor. And bizzare thing is that once I do it, I can move this particular VM storage from Linstor to NFS online and from NFS to Linstor online. I can also migrate VM online, from Linstor, directly to another node and another storage without problems.
I've setup test cluster to reproduce this problem and couldn't - online migration to Linstor storage just worked. I don't know why it's not working on main cluster - any hints how to debug it?
storage.cfg:
dir: local
path /var/lib/vz
content vztmpl,iso,backup
lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images
nodes proxmox-ve5,proxmox-ve4
drbd: linstor-local
content images,rootdir
controller 192.168.8.203
resourcegroup linstor-local
preferlocal yes
nodes proxmox-ve3
zfspool: local-zfs
pool rpool/data
content images,rootdir
nodes proxmox-ve0,proxmox-ve1,proxmox-ve2
sparse 1
nfs: nfs-backup
export /data/nfs
path /mnt/pve/nfs-backup
server backup2
content rootdir,backup,images,iso,vztmpl
options vers=3
pve-manager/7.1-5/6fe299a0 (running kernel: 5.13.19-1-pve)
Linstor storage is, for now, on one host. When I create new VM on linstor it works. When I try to migrate VM from another host (and another storage) to Linstor it fails:
2021-11-22 13:06:53 starting migration of VM 116 to node 'proxmox-ve3' (192.168.8.203)
2021-11-22 13:06:53 found local disk 'local-lvm:vm-116-disk-0' (in current VM config)
2021-11-22 13:06:53 starting VM 116 on remote node 'proxmox-ve3'
2021-11-22 13:07:01 volume 'local-lvm:vm-116-disk-0' is 'linstor-local:vm-116-disk-1' on the target
2021-11-22 13:07:01 start remote tunnel
2021-11-22 13:07:03 ssh tunnel ver 1
2021-11-22 13:07:03 starting storage migration
2021-11-22 13:07:03 scsi1: start migration to nbd:unix:/run/qemu-server/116_nbd.migrate:exportname=drive-scsi1
drive mirror is starting for drive-scsi1 with bandwidth limit: 51200 KB/s
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2021-11-22 13:07:03 ERROR: online migrate failure - block job (mirror) error: drive-scsi1: 'mirror' has been cancelled
2021-11-22 13:07:03 aborting phase 2 - cleanup resources
2021-11-22 13:07:03 migrate_cancel
2021-11-22 13:07:08 ERROR: migration finished with problems (duration 00:00:16)
TASK ERROR: migration problems
Linstor volumes are created during migration, no errors in it's logs. I don't know why Proxmox is cancelling this job.
When I try to move disk from NFS to Linstor (online) it fails:
create full clone of drive scsi0 (nfs-backup:129/vm-129-disk-0.qcow2)
NOTICE
Trying to create diskful resource (vm-129-disk-1) on (proxmox-ve3).
drive mirror is starting for drive-scsi0 with bandwidth limit: 51200 KB/s
drive-scsi0: Cancelling block job
drive-scsi0: Done.
TASK ERROR: storage migration failed: block job (mirror) error: drive-scsi0: 'mirror' has been cancelled
To move storage to Linstor I have first move it to NFS (online), turn off VM and move VM storage offline to Linstor. And bizzare thing is that once I do it, I can move this particular VM storage from Linstor to NFS online and from NFS to Linstor online. I can also migrate VM online, from Linstor, directly to another node and another storage without problems.
I've setup test cluster to reproduce this problem and couldn't - online migration to Linstor storage just worked. I don't know why it's not working on main cluster - any hints how to debug it?
storage.cfg:
dir: local
path /var/lib/vz
content vztmpl,iso,backup
lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images
nodes proxmox-ve5,proxmox-ve4
drbd: linstor-local
content images,rootdir
controller 192.168.8.203
resourcegroup linstor-local
preferlocal yes
nodes proxmox-ve3
zfspool: local-zfs
pool rpool/data
content images,rootdir
nodes proxmox-ve0,proxmox-ve1,proxmox-ve2
sparse 1
nfs: nfs-backup
export /data/nfs
path /mnt/pve/nfs-backup
server backup2
content rootdir,backup,images,iso,vztmpl
options vers=3