Problems with replication jobs

AdrianM2 · May 15, 2018

Hi.

I'm having problems with replication jobs on our proxmox server.
I'm getting:

Code:

2018-05-15 11:54:02 100-0: end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-100-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_100-0_1526378041__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=h2' root@IP -- pvesm import local-zfs:vm-100-disk-1 zfs - -with-snapshots 1' failed: exit code 255

When pasting that command directly into terminal I'm getting:

Code:

volume 'rpool/data/vm-100-disk-1' already exists
cannot send rpool/data/vm-100-disk-1@__replicate_100-0_1526377082__ recursively: snapshot rpool/data/vm-100-disk-1@__replicate_100-0_1526377082__ does not exist
command 'zfs send -Rpv -- rpool/data/vm-100-disk-1@__replicate_100-0_1526377082__' failed: exit code 1

Any ideas what could be the problem?

wolfgang · May 15, 2018

Hi,

this lock like an already fixed bug.
please send the output of

Code:

pveversion -v

AdrianM2 · May 15, 2018

Here is output:

Code:

proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
pve-kernel-4.13.4-1-pve: 4.13.4-25
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90

wolfgang · May 15, 2018

You have to update to min pve-guest-common 2.0-15
If this is done you have to destroy your replica job and make a new one.

AdrianM2 · May 15, 2018

Updated, but problem still exists.

Code:

proxmox-ve: 5.1-43 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.15: 5.1-4
pve-kernel-4.15.17-1-pve: 4.15.17-8
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-21
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-17
pve-cluster: 5.0-27
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9

Some replication jobs are working and some are not. Errors are still the same.

I tried removing not working replication jobs and creating them again.

wolfgang · May 15, 2018

Do you have destroyed the replica job and make a new one?

AdrianM2 · May 15, 2018

wolfgang said:
Do you have destroyed the replica job and make a new one?

Yes. I did.

wolfgang · May 16, 2018

Your jobs never synced ones so there must a problem on the target node with the zfs pool.

AdrianM2 · May 16, 2018

wolfgang said:
Your jobs never synced ones so there must a problem on the target node with the zfs pool.

Any ideas how I can check what is causing this on target node? Zpool status is not showing anything.

Code:

# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 4h25m with 0 errors on Sun May 13 04:49:44 2018
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sda2    ONLINE       0     0     0
        sdb2    ONLINE       0     0     0

errors: No known data errors

wolfgang · May 16, 2018

Check if all images from the failed jobs are removed.

Code:

zfs list -t all

AdrianM2 · May 16, 2018

I changed the target node for these few machines and it is now working. Thanks for your help.

Search

Search

Problems with replication jobs

AdrianM2

New Member

wolfgang

Proxmox Retired Staff

AdrianM2

New Member

wolfgang

Proxmox Retired Staff

AdrianM2

New Member

Attachments

wolfgang

Proxmox Retired Staff

AdrianM2

New Member

wolfgang

Proxmox Retired Staff

AdrianM2

New Member

wolfgang

Proxmox Retired Staff

AdrianM2

New Member

We value your privacy