Live Migration Fails When Changing ZFS Pools

Apr 17, 2020
33
4
13
40
Trying to figure out if this a bug or just a misunderstanding on how live migrations are supposed to work with the latest enhancements. I'm on the 7.1-10 release and I'm trying to live migrate a vm from one node to another. When you do a live migrate it gives you an option to use a different zfs pool on the target host however in my testing that never works. Consider the log below.

My source pool on cloud10 is zfs-nvme-pool-1 and my destination pool is zfs-local-pool and zfs-nvme-pool-1 does not exist on cloud11 however I wasn't trying to move to the same pool I was trying to move to a different pool.

Code:
2022-01-31 07:54:29 use dedicated network address for sending migration traffic (10.0.4.21)
2022-01-31 07:54:29 starting migration of VM 408 to node 'cloud11' (10.0.4.21)
2022-01-31 07:54:29 found generated disk 'zfs-nvme-pool-1:vm-408-cloudinit' (in current VM config)
2022-01-31 07:54:29 found local disk 'zfs-nvme-pool-1:vm-408-disk-0' (in current VM config)
2022-01-31 07:54:30 copying local disk images
2022-01-31 07:54:32 full send of zfs-nvme-pool-1/vm-408-cloudinit@__migration__ estimated size is 57.9K
2022-01-31 07:54:32 total estimated size is 57.9K
2022-01-31 07:54:33 successfully imported 'zfs-local-pool:vm-408-cloudinit'
2022-01-31 07:54:33 volume 'zfs-nvme-pool-1:vm-408-cloudinit' is 'zfs-local-pool:vm-408-cloudinit' on the target
2022-01-31 07:54:33 starting VM 408 on remote node 'cloud11'
2022-01-31 07:54:35 [cloud11] storage 'zfs-nvme-pool-1' is not available on node 'cloud11'
2022-01-31 07:54:36 ERROR: online migrate failure - remote command failed with exit code 255
2022-01-31 07:54:36 aborting phase 2 - cleanup resources
2022-01-31 07:54:36 migrate_cancel
2022-01-31 07:54:38 ERROR: migration finished with problems (duration 00:00:09)
TASK ERROR: migration problems

If I move VM 408 to zfs-local-pool on cloud10 and then try to migrate to zfs-nvme-pool on cloud12 I get a very different error. zfs-local-pool exists on both cloud10 and cloud12 however zfs-nvme-pool-1 only exists on cloud10 and zfs-nvme-pool only exists on cloud12. This time you get a timeout after 300 seconds complaining about the zvol. However I've gone out there and look and the zvol does exist.

Code:
root@cloud12:~# ls -lh -R /dev/zvol/
/dev/zvol/:
total 0
drwxr-xr-x 2 root root 80 Jan 31 09:18 zfs-nvme-pool

/dev/zvol/zfs-nvme-pool:
total 0
lrwxrwxrwx 1 root root  9 Jan 31 09:18 vm-408-cloudinit -> ../../zd0
lrwxrwxrwx 1 root root 10 Jan 31 09:18 vm-408-disk-0 -> ../../zd16

Code:
2022-01-31 09:18:01 use dedicated network address for sending migration traffic (10.0.4.22)
2022-01-31 09:18:01 starting migration of VM 408 to node 'cloud12' (10.0.4.22)
2022-01-31 09:18:02 found generated disk 'zfs-local-pool:vm-408-cloudinit' (in current VM config)
2022-01-31 09:18:02 found local disk 'zfs-local-pool:vm-408-disk-0' (in current VM config)
2022-01-31 09:18:02 drive 'ide2': size of disk 'zfs-local-pool:vm-408-cloudinit' updated from 0T to 4M
2022-01-31 09:18:02 copying local disk images
2022-01-31 09:18:04 full send of zfs-local-pool/vm-408-cloudinit@__migration__ estimated size is 66.0K
2022-01-31 09:18:04 total estimated size is 66.0K
2022-01-31 09:18:04 successfully imported 'zfs-nvme-pool:vm-408-cloudinit'
2022-01-31 09:18:05 volume 'zfs-local-pool:vm-408-cloudinit' is 'zfs-nvme-pool:vm-408-cloudinit' on the target
2022-01-31 09:18:05 starting VM 408 on remote node 'cloud12'
2022-01-31 09:23:06 [cloud12] timeout: no zvol device link for 'vm-408-cloudinit' found after 300 sec found.
2022-01-31 09:23:06 ERROR: online migrate failure - remote command failed with exit code 255
2022-01-31 09:23:06 aborting phase 2 - cleanup resources
2022-01-31 09:23:06 migrate_cancel
2022-01-31 09:23:09 ERROR: migration finished with problems (duration 00:05:08)
TASK ERROR: migration problems

And finally if I add a zfs-nvme-pool-1 to cloud12 so that both cloud10 and cloud12 have the same named pools I get the error above about the vm never starting.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!