pve-zsync issues

Miguel · Jan 22, 2019

Hi,

I have two servers running proxmox 5.3-6. Both run several VMs and I
am using pve-zsync to sync two machines in server1 in server two for
disaster recovery and offline backups.

This has been working without issue with two proxmox servers running
5.1-46. I have just replaced them with two new servers.

I have two jobs, one is reporting that is has to send the full batch
and the other one reporting a failure. Snapshots in the backup server
show 0B.

root@server1:~# pve-zsync status
SOURCE NAME STATUS
100 plesk1 error
102 cpanel1 ok

root@server2:~# zfs list -t snapshot
NAME USED AVAIL
REFER MOUNTPOINT
rpool/data/vm-100-disk-0@rep_plesk1_2019-01-21_22:30:03 0B -
20.4G -
rpool/data/vm-100-disk-1@rep_plesk1_2019-01-21_22:30:03 0B -
67.3G -
rpool/data/vm-100-disk-2@rep_plesk1_2019-01-21_22:30:03 0B -
92.9G -
rpool/data/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01 0B -
20.0G -
rpool/data/vm-102-disk-1@rep_cpanel1_2019-01-22_01:00:01 0B -
60.4G -

root@server1:~# zfs list -t snapshot
NAME USED AVAIL
REFER MOUNTPOINT
rpool/vm-100-disk-0@rep_plesk1_2019-01-19_22:47:37 597M - 20.0G -
rpool/vm-100-disk-0@rep_plesk1_2019-01-20_11:22:21 482M - 20.1G -
rpool/vm-100-disk-0@rep_plesk1_2019-01-21_22:05:08 121M - 20.4G -
rpool/vm-100-disk-0@rep_plesk1_2019-01-21_22:30:03 117M - 20.4G -
rpool/vm-100-disk-1@rep_plesk1_2019-01-19_22:47:37 9.68G - 67.1G -
rpool/vm-100-disk-1@rep_plesk1_2019-01-20_11:22:21 9.49G - 67.2G -
rpool/vm-100-disk-1@rep_plesk1_2019-01-21_22:30:03 4.84G - 67.3G -
rpool/vm-100-disk-2@rep_plesk1_2019-01-19_22:47:37 519M - 92.9G -
rpool/vm-100-disk-2@rep_plesk1_2019-01-20_11:22:21 335M - 92.9G -
rpool/vm-100-disk-2@rep_plesk1_2019-01-21_22:30:03 517M - 92.9G -
rpool/vm-102-disk-0@rep_cpanel1_2019-01-20_01:00:01 1.87G - 20.1G -
rpool/vm-102-disk-0@rep_cpanel1_2019-01-21_01:00:04 1.21G - 20.1G -
rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01 1.25G - 20.0G -
rpool/vm-102-disk-1@rep_cpanel1_2019-01-20_01:00:01 4.94G - 60.5G -
rpool/vm-102-disk-1@rep_cpanel1_2019-01-21_01:00:04 3.97G - 60.5G -
rpool/vm-102-disk-1@rep_cpanel1_2019-01-22_01:00:01 3.31G - 60.4G -

Nigthly jobs report different things:

cpanel1 VM:

WARN: COMMAND:
ssh root@server2 -- zfs list -rt snapshot -Ho name rpool/data/vm-102-disk-0@rep_cpanel1_2019-01-20_01:00:01
GET ERROR:
cannot open 'rpool/data/vm-102-disk-0@rep_cpanel1_2019-01-20_01:00:01': dataset does not exist
full send of rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01 estimated size is 29.7G
total estimated size is 29.7G
TIME SENT SNAPSHOT
01:00:03 23.8M rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01
01:00:04 54.3M rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01
01:00:05 84.7M rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01
01:00:06 115M rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01

and it has two set the full two disks, which I don´t understand why

plesk1 VM:

WARN: COMMAND:
ssh root@server2 -- zfs list -rt snapshot -Ho name rpool/data/vm-100-disk-0@rep_plesk1_2019-01-19_22:47:37
GET ERROR:
cannot open 'rpool/data/vm-100-disk-0@rep_plesk1_2019-01-19_22:47:37': dataset does not exist
full send of rpool/vm-100-disk-0@rep_plesk1_2019-01-22_01:58:55 estimated size is 28.4G
total estimated size is 28.4G
TIME SENT SNAPSHOT
COMMAND:
zfs send -v -- rpool/vm-100-disk-0@rep_plesk1_2019-01-22_01:58:55 | ssh -o 'BatchMode=yes' root@37.187.154.74 -- zfs recv -F -- rpool/data/vm-100-disk-0
GET ERROR:
cannot receive new filesystem stream: destination has snapshots (eg. rpool/data/vm-100-disk-0)
must destroy them to overwrite it

Job --source 100 --name plesk1 got an ERROR!!!
ERROR Message:

wolfgang · Jan 23, 2019

Hi,

can you please send the following files.

/var/lib/pve-zsync/sync_state
/etc/cron.d/pve-zsync

Miguel · Jan 24, 2019

root@server1:~# cat /var/lib/pve-zsync/sync_state
{"100":{"plesk1":{"state":"error","lsync":"2019-01-21_22:30:03","vm_type":"qemu"}},"102":{"cpanel1":{"vm_type":"qemu","lsync":"2019-01-22_01:00:01","state":"error"}}}

root@ibertrix-node1:~# cat /etc/cron.d/pve-zsync
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

0 1 * * * root /usr/sbin/pve-zsync sync --source 100 --dest server2:rpool/data --verbose --maxsnap 7 --name plesk1 --method ssh
0 1 * * * root /usr/sbin/pve-zsync sync --source 102 --dest server2:rpool/data --verbose --maxsnap 7 --name cpanel1 --method ssh

wolfgang · Jan 24, 2019

It looks like the problem is on the server2 side.
The problem is you have multiple snapshots with the same name.
This can normally not happen.

You can erase all datasets on server2 and start over again.
Or you have to find out why this has happened.
As is say this is normally not possible and must be a Bug but without more information, I can't help.
Here on my nodes, I'm not able to reproduce this.

Miguel · Jan 24, 2019

I have removed several times with:

zfs destroy -rv rpool/data/vm-100-disk-0@%

for any of the snapshots in server2.

In server1 I run manually :

pve-zsync sync --source 100 --dest server2:rpool/data --verbose --maxsnap 7 --name plesk1 --method ssh

or:

pve-zsync sync --source 102 --dest server2:rpool/data --verbose --maxsnap 7 --name cpanel1 --method ssh

and it sends the full data (around 70 and 100 Gb). Subsequent snapshots are the ones failing.

What else do you suggest?

Miguel · Jan 24, 2019

Also all snapshots in server2 are showing 0B of use:

root@server2:~# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/vm-100-disk-0@rep_plesk1_2019-01-21_22:30:03 0B - 20.4G -
rpool/data/vm-100-disk-1@rep_plesk1_2019-01-21_22:30:03 0B - 67.3G -
rpool/data/vm-100-disk-2@rep_plesk1_2019-01-21_22:30:03 0B - 92.9G -
rpool/data/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01 0B - 20.0G -
rpool/data/vm-102-disk-1@rep_cpanel1_2019-01-22_01:00:01 0B - 60.4G -

wolfgang · Jan 28, 2019

Please check the zfs db if it works correctly.
zdb -c rpool

Miguel · Jan 29, 2019

After several tries I removed all snapshots in server1 and server2 and launched pve-zsync jobs and let the session finish (before I run it in the background since it took around more than a hour for each VM). Now it seems to work fine.

I still see last snapshots in server2 with 0B. Is this normal?

wolfgang · Jan 30, 2019

Miguel said:
I still see last snapshots in server2 with 0B. Is this normal?

Yes because you do not write any data to it and so the diff is 0B

Search

Search

pve-zsync issues

Miguel

Member

wolfgang

Proxmox Retired Staff

Miguel

Member

wolfgang

Proxmox Retired Staff

Miguel

Member

Miguel

Member

wolfgang

Proxmox Retired Staff

Miguel

Member

wolfgang

Proxmox Retired Staff