pve-zsync issues

Miguel

Member
Nov 27, 2017
44
0
11
47
Hi,

I have two servers running proxmox 5.3-6. Both run several VMs and I
am using pve-zsync to sync two machines in server1 in server two for
disaster recovery and offline backups.

This has been working without issue with two proxmox servers running
5.1-46. I have just replaced them with two new servers.

I have two jobs, one is reporting that is has to send the full batch
and the other one reporting a failure. Snapshots in the backup server
show 0B.

root@server1:~# pve-zsync status
SOURCE NAME STATUS
100 plesk1 error
102 cpanel1 ok

root@server2:~# zfs list -t snapshot
NAME USED AVAIL
REFER MOUNTPOINT
rpool/data/vm-100-disk-0@rep_plesk1_2019-01-21_22:30:03 0B -
20.4G -
rpool/data/vm-100-disk-1@rep_plesk1_2019-01-21_22:30:03 0B -
67.3G -
rpool/data/vm-100-disk-2@rep_plesk1_2019-01-21_22:30:03 0B -
92.9G -
rpool/data/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01 0B -
20.0G -
rpool/data/vm-102-disk-1@rep_cpanel1_2019-01-22_01:00:01 0B -
60.4G -

root@server1:~# zfs list -t snapshot
NAME USED AVAIL
REFER MOUNTPOINT
rpool/vm-100-disk-0@rep_plesk1_2019-01-19_22:47:37 597M - 20.0G -
rpool/vm-100-disk-0@rep_plesk1_2019-01-20_11:22:21 482M - 20.1G -
rpool/vm-100-disk-0@rep_plesk1_2019-01-21_22:05:08 121M - 20.4G -
rpool/vm-100-disk-0@rep_plesk1_2019-01-21_22:30:03 117M - 20.4G -
rpool/vm-100-disk-1@rep_plesk1_2019-01-19_22:47:37 9.68G - 67.1G -
rpool/vm-100-disk-1@rep_plesk1_2019-01-20_11:22:21 9.49G - 67.2G -
rpool/vm-100-disk-1@rep_plesk1_2019-01-21_22:30:03 4.84G - 67.3G -
rpool/vm-100-disk-2@rep_plesk1_2019-01-19_22:47:37 519M - 92.9G -
rpool/vm-100-disk-2@rep_plesk1_2019-01-20_11:22:21 335M - 92.9G -
rpool/vm-100-disk-2@rep_plesk1_2019-01-21_22:30:03 517M - 92.9G -
rpool/vm-102-disk-0@rep_cpanel1_2019-01-20_01:00:01 1.87G - 20.1G -
rpool/vm-102-disk-0@rep_cpanel1_2019-01-21_01:00:04 1.21G - 20.1G -
rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01 1.25G - 20.0G -
rpool/vm-102-disk-1@rep_cpanel1_2019-01-20_01:00:01 4.94G - 60.5G -
rpool/vm-102-disk-1@rep_cpanel1_2019-01-21_01:00:04 3.97G - 60.5G -
rpool/vm-102-disk-1@rep_cpanel1_2019-01-22_01:00:01 3.31G - 60.4G -

Nigthly jobs report different things:

cpanel1 VM:

WARN: COMMAND:
ssh root@server2 -- zfs list -rt snapshot -Ho name rpool/data/vm-102-disk-0@rep_cpanel1_2019-01-20_01:00:01
GET ERROR:
cannot open 'rpool/data/vm-102-disk-0@rep_cpanel1_2019-01-20_01:00:01': dataset does not exist
full send of rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01 estimated size is 29.7G
total estimated size is 29.7G
TIME SENT SNAPSHOT
01:00:03 23.8M rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01
01:00:04 54.3M rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01
01:00:05 84.7M rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01
01:00:06 115M rpool/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01

and it has two set the full two disks, which I don´t understand why

plesk1 VM:

WARN: COMMAND:
ssh root@server2 -- zfs list -rt snapshot -Ho name rpool/data/vm-100-disk-0@rep_plesk1_2019-01-19_22:47:37
GET ERROR:
cannot open 'rpool/data/vm-100-disk-0@rep_plesk1_2019-01-19_22:47:37': dataset does not exist
full send of rpool/vm-100-disk-0@rep_plesk1_2019-01-22_01:58:55 estimated size is 28.4G
total estimated size is 28.4G
TIME SENT SNAPSHOT
COMMAND:
zfs send -v -- rpool/vm-100-disk-0@rep_plesk1_2019-01-22_01:58:55 | ssh -o 'BatchMode=yes' root@37.187.154.74 -- zfs recv -F -- rpool/data/vm-100-disk-0
GET ERROR:
cannot receive new filesystem stream: destination has snapshots (eg. rpool/data/vm-100-disk-0)
must destroy them to overwrite it

Job --source 100 --name plesk1 got an ERROR!!!
ERROR Message:
 
Hi,

can you please send the following files.

/var/lib/pve-zsync/sync_state
/etc/cron.d/pve-zsync
 
root@server1:~# cat /var/lib/pve-zsync/sync_state
{"100":{"plesk1":{"state":"error","lsync":"2019-01-21_22:30:03","vm_type":"qemu"}},"102":{"cpanel1":{"vm_type":"qemu","lsync":"2019-01-22_01:00:01","state":"error"}}}

root@ibertrix-node1:~# cat /etc/cron.d/pve-zsync
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

0 1 * * * root /usr/sbin/pve-zsync sync --source 100 --dest server2:rpool/data --verbose --maxsnap 7 --name plesk1 --method ssh
0 1 * * * root /usr/sbin/pve-zsync sync --source 102 --dest server2:rpool/data --verbose --maxsnap 7 --name cpanel1 --method ssh
 
It looks like the problem is on the server2 side.
The problem is you have multiple snapshots with the same name.
This can normally not happen.

You can erase all datasets on server2 and start over again.
Or you have to find out why this has happened.
As is say this is normally not possible and must be a Bug but without more information, I can't help.
Here on my nodes, I'm not able to reproduce this.
 
I have removed several times with:

zfs destroy -rv rpool/data/vm-100-disk-0@%

for any of the snapshots in server2.

In server1 I run manually :

pve-zsync sync --source 100 --dest server2:rpool/data --verbose --maxsnap 7 --name plesk1 --method ssh

or:

pve-zsync sync --source 102 --dest server2:rpool/data --verbose --maxsnap 7 --name cpanel1 --method ssh

and it sends the full data (around 70 and 100 Gb). Subsequent snapshots are the ones failing.

What else do you suggest?
 
Also all snapshots in server2 are showing 0B of use:

root@server2:~# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/vm-100-disk-0@rep_plesk1_2019-01-21_22:30:03 0B - 20.4G -
rpool/data/vm-100-disk-1@rep_plesk1_2019-01-21_22:30:03 0B - 67.3G -
rpool/data/vm-100-disk-2@rep_plesk1_2019-01-21_22:30:03 0B - 92.9G -
rpool/data/vm-102-disk-0@rep_cpanel1_2019-01-22_01:00:01 0B - 20.0G -
rpool/data/vm-102-disk-1@rep_cpanel1_2019-01-22_01:00:01 0B - 60.4G -
 
Please check the zfs db if it works correctly.
zdb -c rpool
 
After several tries I removed all snapshots in server1 and server2 and launched pve-zsync jobs and let the session finish (before I run it in the background since it took around more than a hour for each VM). Now it seems to work fine.

I still see last snapshots in server2 with 0B. Is this normal?
 
I still see last snapshots in server2 with 0B. Is this normal?
Yes because you do not write any data to it and so the diff is 0B
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!