Problems with replication jobs

AdrianM2

New Member
May 15, 2018
11
0
1
36
Hi.

I'm having problems with replication jobs on our proxmox server.
I'm getting:

Code:
2018-05-15 11:54:02 100-0: end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-100-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_100-0_1526378041__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=h2' root@IP -- pvesm import local-zfs:vm-100-disk-1 zfs - -with-snapshots 1' failed: exit code 255

When pasting that command directly into terminal I'm getting:

Code:
volume 'rpool/data/vm-100-disk-1' already exists
cannot send rpool/data/vm-100-disk-1@__replicate_100-0_1526377082__ recursively: snapshot rpool/data/vm-100-disk-1@__replicate_100-0_1526377082__ does not exist
command 'zfs send -Rpv -- rpool/data/vm-100-disk-1@__replicate_100-0_1526377082__' failed: exit code 1

Any ideas what could be the problem?
 
Hi,

this lock like an already fixed bug.
please send the output of

Code:
pveversion -v
 
Here is output:

Code:
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
pve-kernel-4.13.4-1-pve: 4.13.4-25
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90
 
You have to update to min pve-guest-common 2.0-15
If this is done you have to destroy your replica job and make a new one.
 
Updated, but problem still exists.

Code:
proxmox-ve: 5.1-43 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.15: 5.1-4
pve-kernel-4.15.17-1-pve: 4.15.17-8
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-21
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-17
pve-cluster: 5.0-27
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9

Some replication jobs are working and some are not. Errors are still the same.

upload_2018-5-15_15-14-26.png

I tried removing not working replication jobs and creating them again.
 

Attachments

  • upload_2018-5-15_15-13-4.png
    upload_2018-5-15_15-13-4.png
    65.9 KB · Views: 9
Do you have destroyed the replica job and make a new one?
 
Your jobs never synced ones so there must a problem on the target node with the zfs pool.
 
Your jobs never synced ones so there must a problem on the target node with the zfs pool.

Any ideas how I can check what is causing this on target node? Zpool status is not showing anything.
Code:
# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 4h25m with 0 errors on Sun May 13 04:49:44 2018
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sda2    ONLINE       0     0     0
        sdb2    ONLINE       0     0     0

errors: No known data errors
 
Check if all images from the failed jobs are removed.
Code:
zfs list -t all
 
I changed the target node for these few machines and it is now working. Thanks for your help.