offline migration failure with two ZFS sharing the same pool

Renaud

Member
Oct 11, 2018
1
0
21
55
Montpellier, France
www.sharops.eu
Hello,

I've 4 nodes running PVE 5.2.9. (b, be, n, h)

I created a template for a VM using the GIST (https://git.proxmox.com/?p=pve-docs...c011c659440e4b4b91d985dea79f98e5f083c;hb=HEAD) to be able to create a preconfigured cloudinit VM.

I follow the GIST on the node n. Everything work as expected and I clone the template un "full clone".

Now my goal is to migrate "offline" the cloned VM to another node. I used the normal way with the GUI.

All my nodes I'm using ZFS to store the disk image of my VMs.

When I do a migrate I got this error:
Code:
qm migrate 116 b
2018-10-11 16:43:22 starting migration of VM 116 to node 'b' (10.4.0.4)
2018-10-11 16:43:22 found local disk 'rmain:vm-116-disk-0' (in current VM config)
2018-10-11 16:43:22 found local disk 'rmain:vm-116-disk-1' (via storage)
2018-10-11 16:43:22 found local disk 'zmysql:vm-116-disk-0' (via storage)
2018-10-11 16:43:22 found local disk 'zmysql:vm-116-disk-1' (in current VM config)
2018-10-11 16:43:22 copying disk images
full send of rpool/vm-116-disk-1@__migration__ estimated size is 20.1M
total estimated size is 20.1M
TIME        SENT   SNAPSHOT
full send of rpool/vm-116-disk-0@__migration__ estimated size is 1.89G
total estimated size is 1.89G
TIME        SENT   SNAPSHOT
16:43:25   52.8M   rpool/vm-116-disk-0@__migration__
16:43:26    162M   rpool/vm-116-disk-0@__migration__
16:43:27    271M   rpool/vm-116-disk-0@__migration__
16:43:28    379M   rpool/vm-116-disk-0@__migration__
16:43:29    485M   rpool/vm-116-disk-0@__migration__
16:43:30    571M   rpool/vm-116-disk-0@__migration__
16:43:31    651M   rpool/vm-116-disk-0@__migration__
16:43:32    738M   rpool/vm-116-disk-0@__migration__
16:43:33    830M   rpool/vm-116-disk-0@__migration__
16:43:34    925M   rpool/vm-116-disk-0@__migration__
16:43:35   1005M   rpool/vm-116-disk-0@__migration__
16:43:36   1.07G   rpool/vm-116-disk-0@__migration__
16:43:37   1.14G   rpool/vm-116-disk-0@__migration__
16:43:38   1.21G   rpool/vm-116-disk-0@__migration__
16:43:39   1.28G   rpool/vm-116-disk-0@__migration__
16:43:40   1.37G   rpool/vm-116-disk-0@__migration__
16:43:41   1.46G   rpool/vm-116-disk-0@__migration__
16:43:42   1.56G   rpool/vm-116-disk-0@__migration__
16:43:43   1.66G   rpool/vm-116-disk-0@__migration__
16:43:44   1.76G   rpool/vm-116-disk-0@__migration__
16:43:45   1.85G   rpool/vm-116-disk-0@__migration__
full send of rpool/vm-116-disk-0@__migration__ estimated size is 1.89G
total estimated size is 1.89G
TIME        SENT   SNAPSHOT
rpool/vm-116-disk-0    name    rpool/vm-116-disk-0    -
volume 'rpool/vm-116-disk-0' already exists
command 'zfs send -Rpv -- rpool/vm-116-disk-0@__migration__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2018-10-11 16:43:48 ERROR: Failed to sync data - command 'set -o pipefail && pvesm export zmysql:vm-116-disk-0 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=b' root@10.4.0.4 -- pvesm import zmysql:vm-116-disk-0 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
2018-10-11 16:43:48 aborting phase 1 - cleanup resources
2018-10-11 16:43:48 ERROR: found stale volume copy 'zmysql:vm-116-disk-1' on node 'b'
2018-10-11 16:43:48 ERROR: found stale volume copy 'rmain:vm-116-disk-0' on node 'b'
2018-10-11 16:43:48 ERROR: found stale volume copy 'zmysql:vm-116-disk-0' on node 'b'
2018-10-11 16:43:48 ERROR: migration aborted (duration 00:00:26): Failed to sync data - command 'set -o pipefail && pvesm export zmysql:vm-116-disk-0 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=b' root@10.4.0.4 -- pvesm import zmysql:vm-116-disk-0 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
migration aborted


Code:
# cat /etc/pve/qemu-server/116.conf
agent: 1,fstrim_cloned_disks=1
boot: c
bootdisk: scsi0
cipassword: X
ciuser: X
cores: 2
hotplug: disk,network,usb,memory,cpu
ipconfig0: ip=10.4.4.20/16,gw=10.4.42.254
memory: 4608
name: b.db.2a.re
nameserver: 10.4.42.254
net0: virtio=C2:49:98:31:30:6D,bridge=vmbr1,queues=8,tag=4
numa: 1
scsi0: rmain:vm-116-disk-0,cache=writeback,size=51404M
scsi1: zmysql:vm-116-disk-1,cache=writeback,size=50G
scsihw: virtio-scsi-pci
searchdomain: db.2a.re
serial0: socket
smbios1: uuid=1f9bcd6a-c69a-439c-a169-d91215058878
sockets: 2
sshkeys: X
vga: serial0
vmgenid: 62038c4f-c0aa-4863-91c7-fc8efc3aa29a

zfs list from node "n"
Code:
zfs list -t all
NAME                                                 USED  AVAIL  REFER  MOUNTPOINT
rpool                                               20.2G  3.49T   104K  /rpool
rpool/ROOT                                          2.16G  3.49T    96K  /rpool/ROOT
rpool/ROOT/pve-1                                    2.16G  3.49T  2.16G  /
rpool/base-111-disk-0                               1.41G  3.49T  1.41G  -
rpool/base-111-disk-0@__base__                         0B      -  1.41G  -
rpool/base-111-disk-1                               5.69M  3.49T  5.69M  -
rpool/base-111-disk-1@__base__                         0B      -  5.69M  -
rpool/base-9000-disk-0                               735M  3.49T   735M  -
rpool/base-9000-disk-0@__base__                        0B      -   735M  -
rpool/data                                            96K  3.49T    96K  /rpool/data
rpool/subvol-108-disk-0                              890M  49.1G   890M  /rpool/subvol-108-disk-0
rpool/subvol-108-disk-1                              176K  50.0G   176K  /rpool/subvol-108-disk-1
rpool/swap                                          4.25G  3.50T    64K  -
rpool/vm-107-disk-0                                 9.00G  3.50T   765M  -
rpool/vm-107-disk-0@__replicate_107-0_1539276300__  1.45M      -   765M  -
rpool/vm-111-cloudinit                                76K  3.49T    76K  -
rpool/vm-113-cloudinit                                76K  3.49T    76K  -
rpool/vm-113-disk-0                                  269M  3.49T   272M  -
rpool/vm-113-disk-1                                 78.6M  3.49T  1.29G  -
rpool/vm-116-disk-0                                 1.41G  3.49T  1.41G  -
rpool/vm-116-disk-1                                 5.69M  3.49T  5.69M  -
rpool/vm-9000-cloudinit                               56K  3.49T    56K  -

In fact I've two ZFS entries configured in PVE on the same ZFS Pool (rpool). One with blocksize at 8k (rmain) et another with blockzie at 16k (zmysql).

I think my issue coming from that point during the migration process here:
Code:
2018-10-11 16:43:22 found local disk 'rmain:vm-116-disk-0' (in current VM config)
2018-10-11 16:43:22 found local disk 'rmain:vm-116-disk-1' (via storage)
2018-10-11 16:43:22 found local disk 'zmysql:vm-116-disk-0' (via storage)
2018-10-11 16:43:22 found local disk 'zmysql:vm-116-disk-1' (in current VM config)

My first idea is to know if it's possible to ignore the dup ?

Thanks for your help
Cheers,
Renaud
 
In fact I've two ZFS entries configured in PVE on the same ZFS Pool (rpool). One with blocksize at 8k (rmain) et another with blockzie at 16k (zmysql).
why not use 2 datasets with different properties? this way you do not get the duplicates