[SOLVED] pve-zsync buggy with more than one vm-hdd?

udo

Distinguished Member
Apr 22, 2009
5,981
204
163
Ahrensburg; Germany
Hi,
I do some tests with pve-zsync and the first VM (with one vm-disk) work well.

The second test with an VM and two vm-disks failed - the sync of the first disk wasn't get finished and the zfs-process on the target run with 100%.

First I think it depends on the not round volume-size, but the same happens after resize the vm-hdd to an round value (and remove the first snapshot).

The first sync don't get finished:
Code:
time pve-zsync create -dest 10.1.1.93:rpool/data -source 210 -name vdb02
all data of the first disk are transferred
Code:
## on source
zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                     81.3G   818G    96K  /rpool
rpool/ROOT                28.9G   818G    96K  /rpool/ROOT
rpool/ROOT/pve-1          28.9G   818G  28.9G  /
rpool/data                43.8G   818G    96K  /rpool/data
rpool/data/vm-210-disk-1  6.89G   818G  6.89G  -
rpool/data/vm-210-disk-2  33.2G   818G  33.2G  -
rpool/data/vm-213-disk-1  3.69G   818G  3.69G  -
rpool/swap                8.50G   819G  6.89G  -

zfs list -t snapshot -o name,creation
NAME                                                    CREATION
rpool/data/vm-210-disk-1@rep_vdb02_2017-11-28_11:12:47  Tue Nov 28 11:12 2017

## on target
zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
rpool                     18.0G  1.74T    96K  /rpool
rpool/ROOT                1.30G  1.74T    96K  /rpool/ROOT
rpool/ROOT/pve-1          1.30G  1.74T  1.30G  /
rpool/data                8.24G  1.74T    96K  /rpool/data
rpool/data/vm-210-disk-1  6.94G  1.74T  6.94G  -
rpool/data/vm-334-disk-1  1.31G  1.74T  1.31G  -
rpool/swap                8.50G  1.75T    64K  -

zfs list -t snapshot
NAME                                                       USED  AVAIL  REFER  MOUNTPOINT
rpool/data/vm-334-disk-1@rep_default_2017-11-28_11:30:01      0      -  1.31G  -
The vm is running. Config:
Code:
boot: cd
bootdisk: scsi0
cores: 2
hotplug: 1
memory: 2048
name: vdb02
net0: virtio=6A:C6:68:EB:F9:4F,bridge=vmbr0,tag=5
numa: 1
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
serial0: socket
sockets: 1
scsi0: local-zfs:vm-210-disk-1,format=raw,size=25G
scsi1: local-zfs:vm-210-disk-2,backup=0,format=raw,size=41G
on the target the zfs receive never ends and use 100% cpu:
Code:
root      3530 94.0  0.0  35188  2640 ?        Rs   11:12  20:20 zfs recv -F -- rpool/data/vm-210-disk-1
I had one process running during the whole night - kill don't work, I had to reboot the target server.
On the source the zfs send process still alive, without process usage
Code:
root     23386  0.0  0.0   4292   760 pts/0    S+   11:12   0:00 sh -c zfs send -- rpool/data/vm-210-disk-1@rep_vdb02_2017-11-28_11:12:47 | ssh -o 'BatchMode=yes' root@10.1.1.93 -- zfs recv -F -- rpool/data/vm-210-disk-1 2>&1
root     23388  2.1  0.0  48652  6872 pts/0    S+   11:12   0:30 ssh -o BatchMode=yes root@10.1.1.93 -- zfs recv -F -- rpool/data/vm-210-disk-1
The source is an pve5.1 wit pve-zsync 1.6-15 and the target an pve4 with pve-zsync 1.6-14.

Any hints?

Udo
 
which kernel are you using on the sending side? make sure it is >= -26
 
Hi Fabian,
ahh - due to the same version for the new package (pve-kernel-4.13.4-1-pve) I don't reboot the server after the last updates...
Code:
proxmox-ve: 5.1-26 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.4.76-1-pve: 4.4.76-94
...
But uname show the older running version:
Code:
uname -a
Linux pve02 4.13.4-1-pve #1 SMP PVE 4.13.4-25 (Fri, 13 Oct 2017 08:59:53 +0200) x86_64 GNU/Linux
After an reboot pve-zsync work for the first disk.
I assume that the backup=0 for the second disk prevent this disk for syncing?

Udo
 
yes, disks/mountpoints with backup=0 are skipped