Hi,
I am trying to configure Replication of multiple Proxmox nodes (v5.1) to a single storage node (v5.1) in a cluster as per below.
However, if there are multiple Replications from multiple nodes running at the same time, the chances of failing is very high, with the following errors found in the Replication log.
It is also impossible to resolve this issue until the 1) Replication job is removed and 2) VM's replicated disk image removed from Node Z.
Any clue? I have been facing this issue since v5.0. I might get away by configuring time-specific Replication schedules but would prefer each Replication task to run every 15-30 mins.
Thank you.
I am trying to configure Replication of multiple Proxmox nodes (v5.1) to a single storage node (v5.1) in a cluster as per below.
Code:
Node A <> Replicate <> Node Z
Node B <> Replicate <> Node Z
Node C <> Replicate <> Node Z
However, if there are multiple Replications from multiple nodes running at the same time, the chances of failing is very high, with the following errors found in the Replication log.
It is also impossible to resolve this issue until the 1) Replication job is removed and 2) VM's replicated disk image removed from Node Z.
Code:
2017-10-29 00:32:01 103-0: start replication job
2017-10-29 00:32:01 103-0: guest => VM 103, running => 2616
2017-10-29 00:32:01 103-0: volumes => local-zfs:vm-103-disk-1
2017-10-29 00:32:05 103-0: create snapshot '__replicate_103-0_1509208321__' on local-zfs:vm-103-disk-1
2017-10-29 00:32:05 103-0: full sync 'local-zfs:vm-103-disk-1' (__replicate_103-0_1509208321__)
2017-10-29 00:32:05 103-0: full send of rpool/data/vm-103-disk-1@__replicate_103-0_1509208321__ estimated size is 2.70G
2017-10-29 00:32:05 103-0: total estimated size is 2.70G
2017-10-29 00:32:05 103-0: TIME SENT SNAPSHOT
2017-10-29 00:32:06 103-0: 00:32:06 2.11M rpool/data/vm-103-disk-1@__replicate_103-0_1509208321__
2017-10-29 00:32:07 103-0: 00:32:07 2.11M rpool/data/vm-103-disk-1@__replicate_103-0_1509208321__
2017-10-29 00:32:08 103-0: rpool/data/vm-103-disk-1 name rpool/data/vm-103-disk-1 -
2017-10-29 00:32:08 103-0: volume 'rpool/data/vm-103-disk-1' already exists
2017-10-29 00:32:08 103-0: warning: cannot send 'rpool/data/vm-103-disk-1@__replicate_103-0_1509208321__': signal received
2017-10-29 00:32:08 103-0: cannot send 'rpool/data/vm-103-disk-1': I/O error
2017-10-29 00:32:08 103-0: command 'zfs send -Rpv -- rpool/data/vm-103-disk-1@__replicate_103-0_1509208321__' failed: exit code 1
2017-10-29 00:32:08 103-0: delete previous replication snapshot '__replicate_103-0_1509208321__' on local-zfs:vm-103-disk-1
2017-10-29 00:32:08 103-0: end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-103-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_103-0_1509208321__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve-repl-1' root@10.0.0.51 -- pvesm import local-zfs:vm-103-disk-1 zfs - -with-snapshots 1' failed: exit code 255
Any clue? I have been facing this issue since v5.0. I might get away by configuring time-specific Replication schedules but would prefer each Replication task to run every 15-30 mins.
Thank you.