Here is what if found testing live migration. I think the problem comes from starting suffix number for disks.
From pve-manager/5.2-1/0fcd7879 to pve-manager/5.2-1/0fcd7879 works as expected.
Code:
root@p27:~# qm migrate 125 p25 --online --with-local-disks
2019-01-16 18:04:13 starting migration of VM 125 to node 'p25' (10.31.1.25)
2019-01-16 18:04:13 found local disk 'local-zfs:vm-125-disk-1' (in current VM config)
2019-01-16 18:04:13 copying disk images
2019-01-16 18:04:13 starting VM 125 on remote node 'p25'
2019-01-16 18:04:16 start remote tunnel
2019-01-16 18:04:17 ssh tunnel ver 1
2019-01-16 18:04:17 starting storage migration
2019-01-16 18:04:17 scsi0: start migration to nbd:10.31.1.25:60000:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 0 bytes remaining: 16106127360 bytes total: 16106127360 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi0: transferred: 542113792 bytes remaining: 15564013568 bytes total: 16106127360 bytes progression: 3.37 % busy: 1 ready: 0
drive-scsi0: transferred: 1224736768 bytes remaining: 14881390592 bytes total: 16106127360 bytes progression: 7.60 % busy: 1 ready: 0
drive-scsi0: transferred: 1865416704 bytes remaining: 14240710656 bytes total: 16106127360 bytes progression: 11.58 % busy: 1 ready: 0
drive-scsi0: transferred: 2502950912 bytes remaining: 13603176448 bytes total: 16106127360 bytes progression: 15.54 % busy: 1 ready: 0
drive-scsi0: transferred: 2762997760 bytes remaining: 13343129600 bytes total: 16106127360 bytes progression: 17.15 % busy: 1 ready: 0
drive-scsi0: transferred: 2861563904 bytes remaining: 13244563456 bytes total: 16106127360 bytes progression: 17.77 % busy: 1 ready: 0
drive-scsi0: transferred: 3382706176 bytes remaining: 12723421184 bytes total: 16106127360 bytes progression: 21.00 % busy: 1 ready: 0
drive-scsi0: transferred: 4062183424 bytes remaining: 12043943936 bytes total: 16106127360 bytes progression: 25.22 % busy: 1 ready: 0
drive-scsi0: transferred: 4701814784 bytes remaining: 11404312576 bytes total: 16106127360 bytes progression: 29.19 % busy: 1 ready: 0
drive-scsi0: transferred: 4863295488 bytes remaining: 11242831872 bytes total: 16106127360 bytes progression: 30.20 % busy: 1 ready: 0
drive-scsi0: transferred: 4880072704 bytes remaining: 11226054656 bytes total: 16106127360 bytes progression: 30.30 % busy: 1 ready: 0
drive-scsi0: transferred: 4880072704 bytes remaining: 11226054656 bytes total: 16106127360 bytes progression: 30.30 % busy: 1 ready: 0
drive-scsi0: transferred: 5114953728 bytes remaining: 10991173632 bytes total: 16106127360 bytes progression: 31.76 % busy: 1 ready: 0
drive-scsi0: transferred: 5738856448 bytes remaining: 10367270912 bytes total: 16106127360 bytes progression: 35.63 % busy: 1 ready: 0
drive-scsi0: transferred: 6311378944 bytes remaining: 9794748416 bytes total: 16106127360 bytes progression: 39.19 % busy: 1 ready: 0
drive-scsi0: transferred: 7051673600 bytes remaining: 9054453760 bytes total: 16106127360 bytes progression: 43.78 % busy: 1 ready: 0
drive-scsi0: transferred: 7758413824 bytes remaining: 8347713536 bytes total: 16106127360 bytes progression: 48.17 % busy: 1 ready: 0
drive-scsi0: transferred: 8426356736 bytes remaining: 7679770624 bytes total: 16106127360 bytes progression: 52.32 % busy: 1 ready: 0
drive-scsi0: transferred: 8902410240 bytes remaining: 7203717120 bytes total: 16106127360 bytes progression: 55.27 % busy: 1 ready: 0
drive-scsi0: transferred: 8902410240 bytes remaining: 7203717120 bytes total: 16106127360 bytes progression: 55.27 % busy: 1 ready: 0
drive-scsi0: transferred: 8902410240 bytes remaining: 7203717120 bytes total: 16106127360 bytes progression: 55.27 % busy: 1 ready: 0
drive-scsi0: transferred: 8902410240 bytes remaining: 7203717120 bytes total: 16106127360 bytes progression: 55.27 % busy: 1 ready: 0
drive-scsi0: transferred: 8902410240 bytes remaining: 7203717120 bytes total: 16106127360 bytes progression: 55.27 % busy: 1 ready: 0
drive-scsi0: transferred: 9414115328 bytes remaining: 6692208640 bytes total: 16106323968 bytes progression: 58.45 % busy: 1 ready: 0
drive-scsi0: transferred: 10020192256 bytes remaining: 6086131712 bytes total: 16106323968 bytes progression: 62.21 % busy: 1 ready: 0
drive-scsi0: transferred: 10634657792 bytes remaining: 5471666176 bytes total: 16106323968 bytes progression: 66.03 % busy: 1 ready: 0
drive-scsi0: transferred: 10872684544 bytes remaining: 5233639424 bytes total: 16106323968 bytes progression: 67.51 % busy: 1 ready: 0
drive-scsi0: transferred: 10872684544 bytes remaining: 5233639424 bytes total: 16106323968 bytes progression: 67.51 % busy: 1 ready: 0
drive-scsi0: transferred: 10872684544 bytes remaining: 5233639424 bytes total: 16106323968 bytes progression: 67.51 % busy: 1 ready: 0
drive-scsi0: transferred: 10872684544 bytes remaining: 5233639424 bytes total: 16106323968 bytes progression: 67.51 % busy: 1 ready: 0
drive-scsi0: transferred: 10872684544 bytes remaining: 5233639424 bytes total: 16106323968 bytes progression: 67.51 % busy: 1 ready: 0
drive-scsi0: transferred: 10872684544 bytes remaining: 5233639424 bytes total: 16106323968 bytes progression: 67.51 % busy: 1 ready: 0
drive-scsi0: transferred: 11012145152 bytes remaining: 5094178816 bytes total: 16106323968 bytes progression: 68.37 % busy: 1 ready: 0
drive-scsi0: transferred: 11012145152 bytes remaining: 5094178816 bytes total: 16106323968 bytes progression: 68.37 % busy: 1 ready: 0
drive-scsi0: transferred: 11245977600 bytes remaining: 4860346368 bytes total: 16106323968 bytes progression: 69.82 % busy: 1 ready: 0
drive-scsi0: transferred: 11856248832 bytes remaining: 4250075136 bytes total: 16106323968 bytes progression: 73.61 % busy: 1 ready: 0
drive-scsi0: transferred: 12481200128 bytes remaining: 3625123840 bytes total: 16106323968 bytes progression: 77.49 % busy: 1 ready: 0
drive-scsi0: transferred: 13074694144 bytes remaining: 3031629824 bytes total: 16106323968 bytes progression: 81.18 % busy: 1 ready: 0
drive-scsi0: transferred: 13778288640 bytes remaining: 2328035328 bytes total: 16106323968 bytes progression: 85.55 % busy: 1 ready: 0
drive-scsi0: transferred: 14431551488 bytes remaining: 1674772480 bytes total: 16106323968 bytes progression: 89.60 % busy: 1 ready: 0
drive-scsi0: transferred: 14991491072 bytes remaining: 1114832896 bytes total: 16106323968 bytes progression: 93.08 % busy: 1 ready: 0
drive-scsi0: transferred: 15694036992 bytes remaining: 412286976 bytes total: 16106323968 bytes progression: 97.44 % busy: 1 ready: 0
drive-scsi0: transferred: 16106323968 bytes remaining: 0 bytes total: 16106323968 bytes progression: 100.00 % busy: 1 ready: 0
drive-scsi0: transferred: 16106323968 bytes remaining: 0 bytes total: 16106323968 bytes progression: 100.00 % busy: 1 ready: 0
drive-scsi0: transferred: 16106323968 bytes remaining: 0 bytes total: 16106323968 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2019-01-16 18:05:03 starting online/live migration on unix:/run/qemu-server/125.migrate
2019-01-16 18:05:03 migrate_set_speed: 8589934592
2019-01-16 18:05:03 migrate_set_downtime: 0.1
2019-01-16 18:05:03 set migration_caps
2019-01-16 18:05:03 set cachesize: 134217728
2019-01-16 18:05:03 start migrate command to unix:/run/qemu-server/125.migrate
2019-01-16 18:05:04 migration speed: 21.79 MB/s - downtime 26 ms
2019-01-16 18:05:04 migration status: completed
drive-scsi0: transferred: 16106323968 bytes remaining: 0 bytes total: 16106323968 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi0 : finished
2019-01-16 18:05:09 migration finished successfully (duration 00:00:57)
From pve-manager/5.2-1/0fcd7879 to pve-manager/5.3-7/e8ed1e22 also works almost as expected.
ZFS disk device has changed from vm-125-disk-1 to vm-125-disk-0.
Code:
root@p25:~# qm migrate 125 p28 --online --with-local-disks
2019-01-16 18:06:50 starting migration of VM 125 to node 'p28' (10.31.1.28)
2019-01-16 18:06:50 found local disk 'local-zfs:vm-125-disk-1' (in current VM config)
2019-01-16 18:06:50 copying disk images
2019-01-16 18:06:50 starting VM 125 on remote node 'p28'
2019-01-16 18:06:53 start remote tunnel
2019-01-16 18:06:53 ssh tunnel ver 1
2019-01-16 18:06:53 starting storage migration
2019-01-16 18:06:53 scsi0: start migration to nbd:10.31.1.28:60000:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 0 bytes remaining: 16106127360 bytes total: 16106127360 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi0: transferred: 954204160 bytes remaining: 15151923200 bytes total: 16106127360 bytes progression: 5.92 % busy: 1 ready: 0
drive-scsi0: transferred: 1264582656 bytes remaining: 14841544704 bytes total: 16106127360 bytes progression: 7.85 % busy: 1 ready: 0
drive-scsi0: transferred: 2006974464 bytes remaining: 14099152896 bytes total: 16106127360 bytes progression: 12.46 % busy: 1 ready: 0
drive-scsi0: transferred: 2678063104 bytes remaining: 13428064256 bytes total: 16106127360 bytes progression: 16.63 % busy: 1 ready: 0
drive-scsi0: transferred: 3695181824 bytes remaining: 12410945536 bytes total: 16106127360 bytes progression: 22.94 % busy: 1 ready: 0
drive-scsi0: transferred: 4696571904 bytes remaining: 11409555456 bytes total: 16106127360 bytes progression: 29.16 % busy: 1 ready: 0
drive-scsi0: transferred: 5680136192 bytes remaining: 10425991168 bytes total: 16106127360 bytes progression: 35.27 % busy: 1 ready: 0
drive-scsi0: transferred: 6637486080 bytes remaining: 9468641280 bytes total: 16106127360 bytes progression: 41.21 % busy: 1 ready: 0
drive-scsi0: transferred: 7356809216 bytes remaining: 8749318144 bytes total: 16106127360 bytes progression: 45.68 % busy: 1 ready: 0
drive-scsi0: transferred: 7356809216 bytes remaining: 8749318144 bytes total: 16106127360 bytes progression: 45.68 % busy: 1 ready: 0
drive-scsi0: transferred: 7837057024 bytes remaining: 8269070336 bytes total: 16106127360 bytes progression: 48.66 % busy: 1 ready: 0
drive-scsi0: transferred: 7885291520 bytes remaining: 8220835840 bytes total: 16106127360 bytes progression: 48.96 % busy: 1 ready: 0
drive-scsi0: transferred: 8519680000 bytes remaining: 7586447360 bytes total: 16106127360 bytes progression: 52.90 % busy: 1 ready: 0
drive-scsi0: transferred: 9334423552 bytes remaining: 6771703808 bytes total: 16106127360 bytes progression: 57.96 % busy: 1 ready: 0
drive-scsi0: transferred: 10162798592 bytes remaining: 5943328768 bytes total: 16106127360 bytes progression: 63.10 % busy: 1 ready: 0
drive-scsi0: transferred: 11140071424 bytes remaining: 4966055936 bytes total: 16106127360 bytes progression: 69.17 % busy: 1 ready: 0
drive-scsi0: transferred: 11626610688 bytes remaining: 4479516672 bytes total: 16106127360 bytes progression: 72.19 % busy: 1 ready: 0
drive-scsi0: transferred: 12583960576 bytes remaining: 3522166784 bytes total: 16106127360 bytes progression: 78.13 % busy: 1 ready: 0
drive-scsi0: transferred: 13614710784 bytes remaining: 2491416576 bytes total: 16106127360 bytes progression: 84.53 % busy: 1 ready: 0
drive-scsi0: transferred: 14286848000 bytes remaining: 1819279360 bytes total: 16106127360 bytes progression: 88.70 % busy: 1 ready: 0
drive-scsi0: transferred: 15258877952 bytes remaining: 847249408 bytes total: 16106127360 bytes progression: 94.74 % busy: 1 ready: 0
drive-scsi0: transferred: 16106127360 bytes remaining: 0 bytes total: 16106127360 bytes progression: 100.00 % busy: 1 ready: 0
drive-scsi0: transferred: 16106127360 bytes remaining: 0 bytes total: 16106127360 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2019-01-16 18:07:17 starting online/live migration on unix:/run/qemu-server/125.migrate
2019-01-16 18:07:17 migrate_set_speed: 8589934592
2019-01-16 18:07:17 migrate_set_downtime: 0.1
2019-01-16 18:07:17 set migration_caps
2019-01-16 18:07:17 set cachesize: 134217728
2019-01-16 18:07:17 start migrate command to unix:/run/qemu-server/125.migrate
2019-01-16 18:07:18 migration speed: 40.96 MB/s - downtime 21 ms
2019-01-16 18:07:18 migration status: completed
drive-scsi0: transferred: 16106127360 bytes remaining: 0 bytes total: 16106127360 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi0 : finished
2019-01-16 18:07:24 migration finished successfully (duration 00:00:35)
root@p25:~#
Migration back is reported successfully, but in reality it failed.
Code:
root@p28:~# qm migrate 125 p27 --online --with-local-disks
2019-01-16 18:13:33 starting migration of VM 125 to node 'p27' (10.31.1.27)
2019-01-16 18:13:33 found local disk 'local-zfs:vm-125-disk-0' (in current VM config)
2019-01-16 18:13:33 copying disk images
2019-01-16 18:13:33 starting VM 125 on remote node 'p27'
2019-01-16 18:13:36 start remote tunnel
2019-01-16 18:13:37 ssh tunnel ver 1
2019-01-16 18:13:37 starting online/live migration on unix:/run/qemu-server/125.migrate
2019-01-16 18:13:37 migrate_set_speed: 8589934592
2019-01-16 18:13:37 migrate_set_downtime: 0.1
2019-01-16 18:13:37 set migration_caps
2019-01-16 18:13:37 set cachesize: 134217728
2019-01-16 18:13:37 start migrate command to unix:/run/qemu-server/125.migrate
2019-01-16 18:13:38 migration speed: 1024.00 MB/s - downtime 8 ms
2019-01-16 18:13:38 migration status: completed
2019-01-16 18:13:41 migration finished successfully (duration 00:00:09)
We can still see the disk on the source:
Code:
root@p28:~# zfs list -t all | grep 125
rpool/data/vm-125-disk-0 891M 892G 891M -
This is the new disk on destination, still empty:
Code:
rpool/data/vm-125-disk-1 56K 4.62T 56K -
VM is still running, but spitting out FS errors, because data is gone.
I can only stop it and it won't start again obviously because:
Code:
Could not open '/dev/zvol/rpool/data/vm-125-disk-0': No such file or directory
I guess i could zfs send that drive from 5.3 back to 5.2, fix the naming in ZFS or VM conf file and make it running again.
So it seems that the problem might be related to the index number at the end of the drive. 5.3 starts with -0 while 5.2 starts with 1.
I guess if that were to be fixed so that 5.3 also starts with -1 suffix, all might be good and we could upgrade our clusters.