ZFS cli live migration

jayg30

Member
Nov 8, 2017
50
4
13
38
I've got a 2 node test enviroment (would be 3 but that node is in use right now). I was testing the replication and migration capabilities in the latest proxmox release using local ZFS storage.

Reading the forums I found out that LIVE migration doesn't work right now UNLESS you turn off replication and perform the migration via the command line with the flag (--with-local-disks). So since this is a test enviroment I gave it a try. I'm going off memory here but this is what I witnessed;

Starting State
VM on host 1
host 1
rpool/data/vm-110-disk-0 10.6G​
host 2
empty​

host 1 to host 2 migration
VM on host 2
VM boots fine
host 1
rpool/data/vm-110-disk-0 440K​
host 2
rpool/data/vm-110-disk-0 10.6G​

host 2 to host 1 migration
deleted replication job
migration doesn't report failure, disk doesn't appear to be moved, VM is trying to boot but is just looping
VM shown on host 1
host 1
rpool/data/vm-110-disk-0 440K​
host 2
rpool/data/vm-110-disk-0 10.6G
host 1 to host 2 migration
migration reports failure this time
VM remains on host 1
host 1
rpool/data/vm-110-disk-0 440K​
host 2
rpool/data/vm-110-disk-0 10.6G
rpool/data/vm-110-disk-1 440K​

I was able to resolve the issue by using zfs commands to rename the disk on host 1 (disk-0 to disk-1), that allowed me to perform the vm migration, then after the migration changing the VM disk back to disk-0, and finally using zfs destroy command to delete the bad disk-1.

This obviously doesn't look ready to me. I wanted to make users aware of this and see if anyone else has tried migrating a VM back and forth between nodes using the --with-local-disks flag with success and perhaps I just am having issues of my own.
 
I might even be mistaken on the course of events. I tried again and this time captured the information from the CLI of both nodes (pve02 & pve03). See here;

Code:
---------------------
VM running on PVE03
---------------------
root@pve02:~# zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool             9.77G  2.51T   104K  /rpool
rpool/ROOT        1.26G  2.51T    96K  /rpool/ROOT
rpool/ROOT/pve-1  1.26G  2.51T  1.26G  /
rpool/data          96K  2.51T    96K  /rpool/data
rpool/swap        8.50G  2.52T    56K  -

root@pve03:~# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                     20.4G  1.55T   104K  /rpool
rpool/ROOT                1.29G  1.55T    96K  /rpool/ROOT
rpool/ROOT/pve-1          1.29G  1.55T  1.29G  /
rpool/data                10.6G  1.55T    96K  /rpool/data
rpool/data/vm-110-disk-0  10.6G  1.55T  10.6G  -
rpool/swap                8.50G  1.56T    56K  -


---------------------
VM migration from PVE03 to PVE02
---------------------
root@pve03:~# qm migrate 110 pve02 --online --with-local-disks
2019-01-22 17:20:26 starting migration of VM 110 to node 'pve02' (192.168.0.245)
2019-01-22 17:20:26 found local disk 'local-zfs:vm-110-disk-0' (in current VM config)
2019-01-22 17:20:26 copying disk images
2019-01-22 17:20:26 starting VM 110 on remote node 'pve02'
2019-01-22 17:20:29 start remote tunnel
2019-01-22 17:20:29 ssh tunnel ver 1
2019-01-22 17:20:29 starting online/live migration on unix:/run/qemu-server/110.migrate
2019-01-22 17:20:29 migrate_set_speed: 8589934592
2019-01-22 17:20:29 migrate_set_downtime: 0.1
2019-01-22 17:20:29 set migration_caps
2019-01-22 17:20:29 set cachesize: 536870912
2019-01-22 17:20:29 start migrate command to unix:/run/qemu-server/110.migrate
2019-01-22 17:20:30 migration status: active (transferred 118009808, remaining 4165402624), total 4312604672)
2019-01-22 17:20:30 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:31 migration status: active (transferred 235903759, remaining 4041105408), total 4312604672)
2019-01-22 17:20:31 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:32 migration status: active (transferred 354047685, remaining 3911135232), total 4312604672)
2019-01-22 17:20:32 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:33 migration status: active (transferred 471770545, remaining 3784568832), total 4312604672)
2019-01-22 17:20:33 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:34 migration status: active (transferred 589799019, remaining 3643777024), total 4312604672)
2019-01-22 17:20:34 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:35 migration status: active (transferred 707539474, remaining 3509202944), total 4312604672)
2019-01-22 17:20:35 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:37 migration status: active (transferred 825593220, remaining 3364364288), total 4312604672)
2019-01-22 17:20:37 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:38 migration status: active (transferred 943337563, remaining 3242930176), total 4312604672)
2019-01-22 17:20:38 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:39 migration status: active (transferred 1051574317, remaining 2541412352), total 4312604672)
2019-01-22 17:20:39 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:40 migration status: active (transferred 1161635136, remaining 1675120640), total 4312604672)
2019-01-22 17:20:40 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:41 migration status: active (transferred 1274712772, remaining 1027403776), total 4312604672)
2019-01-22 17:20:41 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:42 migration status: active (transferred 1392621114, remaining 900284416), total 4312604672)
2019-01-22 17:20:42 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:43 migration status: active (transferred 1510598449, remaining 765997056), total 4312604672)
2019-01-22 17:20:43 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:44 migration status: active (transferred 1628276094, remaining 643231744), total 4312604672)
2019-01-22 17:20:44 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 0 overflow 0
2019-01-22 17:20:45 migration status: active (transferred 1742841453, remaining 29966336), total 4312604672)
2019-01-22 17:20:45 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 2498 overflow 0
2019-01-22 17:20:45 migration status: active (transferred 1754906545, remaining 16371712), total 4312604672)
2019-01-22 17:20:45 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 5437 overflow 0
2019-01-22 17:20:45 migration status: active (transferred 1766946465, remaining 9076736), total 4312604672)
2019-01-22 17:20:45 migration xbzrle cachesize: 536870912 transferred 0 pages 0 cachemiss 8370 overflow 0
2019-01-22 17:20:45 migration speed: 256.00 MB/s - downtime 37 ms
2019-01-22 17:20:45 migration status: completed
2019-01-22 17:20:48 migration finished successfully (duration 00:00:22)


---------------------
VM running on PVE02
VM WONT BOOT
---------------------
root@pve02:~# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                     9.77G  2.51T   104K  /rpool
rpool/ROOT                1.26G  2.51T    96K  /rpool/ROOT
rpool/ROOT/pve-1          1.26G  2.51T  1.26G  /
rpool/data                 368K  2.51T    96K  /rpool/data
rpool/data/vm-110-disk-0   272K  2.51T   272K  -
rpool/swap                8.50G  2.52T    56K  -

root@pve03:~# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                     20.4G  1.55T   104K  /rpool
rpool/ROOT                1.29G  1.55T    96K  /rpool/ROOT
rpool/ROOT/pve-1          1.29G  1.55T  1.29G  /
rpool/data                10.6G  1.55T    96K  /rpool/data
rpool/data/vm-110-disk-0  10.6G  1.55T  10.6G  -
rpool/swap                8.50G  1.56T    56K  -
 
And now it looks like it is working. o_O
It transferred and cleaned up after itself. I used;

Code:
qm migrate 110 pve03 --online --with-local-disks --migration_type insecure

So do you have to use the --migration_type flag?
 
HI
So do you have to use the --migration_type flag?
The migration flag is for the transport layer only.

I guess you get in conflict with the replica it takes up to 1minit that the images on the targets are erased.
If you try the migration before you fail.
 
HI

The migration flag is for the transport layer only.

I guess you get in conflict with the replica it takes up to 1minit that the images on the targets are erased.
If you try the migration before you fail.

Thanks. I'll do some further testing on my end with this in mind to make sure. I suspect that this will eventually get pushed up into the GUI and I have the spare hardware right now test it a bunch,
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!