migrating non-shared storage ZFS-backed VM

vkhera

Member
Feb 24, 2015
192
15
18
Maryland, USA
If I make a VM and have the virtual drive on storage "local" as a qcow2 file, proxmox ve 4 beta will permit me to migrate the VM from node 1 to node 2. I only care to do this with the machines off, but if it works while running, that's an added bonus.

If I make the VM with virtual drive on non-shared ZFS storage "pve1-zfs" on node1, it will not let me migrate to node 2 because storage "pve1-zfs" does not exist on node 2. On node 2 I named the ZFS storage "pve2-zfs".

My question is: does the migration fail because it is on ZFS, or does it fail because I need to name my ZFS storage identically on both nodes? Also, if it is supposed to work, will it use zfs send/receive to efficiently (and optimally) copy the virtual drives or will it still use scp?
 
I did a test making a new ZFS storage with the same name on both nodes and it allows me to migrate, however it fails with this error:

Code:
Aug 25 13:18:38 starting migration of VM 100 to node 'pve2' (192.168.7.17)
Aug 25 13:18:38 copying disk images
send from @ to rpool/vm-100-disk-1@__migration__ estimated size is 30K
total estimated size is 30K
TIME        SENT   SNAPSHOT
Aug 25 13:18:38 ERROR: Failed to sync data - unable to parse zfs volume name 'rpool/vm-100-disk-1'
Aug 25 13:18:38 aborting phase 1 - cleanup resources
Aug 25 13:18:38 ERROR: found stale volume copy 'zfstest:vm-100-disk-1' on node 'pve2'
Aug 25 13:18:38 ERROR: found stale volume copy 'zfstest:rpool/vm-100-disk-1' on node 'pve2'
Aug 25 13:18:38 ERROR: migration aborted (duration 00:00:01): Failed to sync data - unable to parse zfs volume name 'rpool/vm-100-disk-1'
TASK ERROR: migration aborted



[code]

and leaves a vm-100-disk-1 zvol on node pve2.

I will name all my ZFS storage the same on all nodes to permit migrations to succeed, but there still seems to be a bug here.
 
If I do it with a QCOW disk image stored on "local" storage, it does what is necessary and copies the disk image.

When I try the same with a ZFS backed zvol image, it seems to fail as above after taking a __migration__ ZFS snapshot on the ZVOL, then failing to do the send/recv to the other node. The ZVOL is created on the other node but seems to be empty.

I consider this a bug with the ZFS storage.

The workaround is to do a backup/restore, but then the UUID changes, and I have to do two copies over the network instead of just one.
 
Whether the VM is stopped or not does not matter. You cannot migrate a VM without shared storage.

This is the second time you've said this, and this is the second time I'm telling you that it does work for file-based VMs as long as the storage on both nodes is the same name (and I presume type). I don't know why you don't believe me, so I made a video to show you: https://youtu.be/hM00qBIOQc4
It only fails on the ZFS based VM disk images, and when you inspect the system you can see evidence of it having tried to make a copy using a ZFS snapshot, send + receive, so that is why I say there is a bug with that feature.
 
Hi,
this is fixed, but it takes a few days till it is available in the testing repo.
 
If I do it with a QCOW disk image stored on "local" storage, it does what is necessary and copies the disk image.

When I try the same with a ZFS backed zvol image, it seems to fail as above after taking a __migration__ ZFS snapshot on the ZVOL, then failing to do the send/recv to the other node. The ZVOL is created on the other node but seems to be empty.

I consider this a bug with the ZFS storage.

The workaround is to do a backup/restore, but then the UUID changes, and I have to do two copies over the network instead of just one.

What about if your disk image isn't a zvol, but just a raw or qcow2 image that is stored in a zfs filesystem directory, does that work just like non-zfs local storage?