Hello,
in a 3 Nodes proxmox 5 cluster ZFS Replication,
if i migrate a vps (lxc) from master to slave, Raplicatin stop working on the new node:
2017-11-15 11:47:00 100-2: start replication job
2017-11-15 11:47:00 100-2: guest => CT 100, running => 1
2017-11-15 11:47:00 100-2: volumes => SSDstorage:subvol-100-disk-1
2017-11-15 11:47:01 100-2: freeze guest filesystem
2017-11-15 11:47:01 100-2: create snapshot '__replicate_100-2_1510742820__' on SSDstorage:subvol-100-disk-1
2017-11-15 11:47:01 100-2: thaw guest filesystem
2017-11-15 11:47:01 100-2: full sync 'SSDstorage:subvol-100-disk-1' (__replicate_100-2_1510742820__)
2017-11-15 11:47:02 100-2: full send of SSDstorage/subvol-100-disk-1@__replicate_100-1_1510742732__ estimated size is 8.01G
2017-11-15 11:47:02 100-2: send from @__replicate_100-1_1510742732__ to SSDstorage/subvol-100-disk-1@__replicate_100-2_1510742820__ estimated size is 1.41M
2017-11-15 11:47:02 100-2: total estimated size is 8.01G
2017-11-15 11:47:02 100-2: TIME SENT SNAPSHOT
2017-11-15 11:47:02 100-2: SSDstorage/subvol-100-disk-1 name SSDstorage/subvol-100-disk-1 -
2017-11-15 11:47:02 100-2: volume 'SSDstorage/subvol-100-disk-1' already exists
2017-11-15 11:47:02 100-2: warning: cannot send 'SSDstorage/subvol-100-disk-1@__replicate_100-1_1510742732__': signal received
2017-11-15 11:47:02 100-2: TIME SENT SNAPSHOT
2017-11-15 11:47:02 100-2: warning: cannot send 'SSDstorage/subvol-100-disk-1@__replicate_100-2_1510742820__': Broken pipe
2017-11-15 11:47:02 100-2: cannot send 'SSDstorage/subvol-100-disk-1': I/O error
2017-11-15 11:47:02 100-2: command 'zfs send -Rpv -- SSDstorage/subvol-100-disk-1@__replicate_100-2_1510742820__' failed: exit code 1
2017-11-15 11:47:02 100-2: delete previous replication snapshot '__replicate_100-2_1510742820__' on SSDstorage:subvol-100-disk-1
2017-11-15 11:47:02 100-2: end replication job with error: command 'set -o pipefail && pvesm export SSDstorage:subvol-100-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_100-2_1510742820__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=storage' root@192.168.100.15 -- pvesm import SSDstorage:subvol-100-disk-1 zfs - -with-snapshots 1' failed: exit code 255
-----------------------------------------------------------------------------------
Workaround ( to avoid risk of manually delete the correct volume ):
On the new slave:
- zfs rename SSDstorage/subvol-100-disk-1 SSDstorage/subvol-999-disk-1
resync
test all
delete old volume
- zfs destroy SSDstorage/subvol-999-disk-1
In Practice, every time i migrate from Master to Slave and again from Slave to Master with active replication i have this error:
2017-11-15 11:58:56 ERROR: switch replication job target failed - replication job for guest '100' to target 'local/nodo1' already exists
in a 3 Nodes proxmox 5 cluster ZFS Replication,
if i migrate a vps (lxc) from master to slave, Raplicatin stop working on the new node:
2017-11-15 11:47:00 100-2: start replication job
2017-11-15 11:47:00 100-2: guest => CT 100, running => 1
2017-11-15 11:47:00 100-2: volumes => SSDstorage:subvol-100-disk-1
2017-11-15 11:47:01 100-2: freeze guest filesystem
2017-11-15 11:47:01 100-2: create snapshot '__replicate_100-2_1510742820__' on SSDstorage:subvol-100-disk-1
2017-11-15 11:47:01 100-2: thaw guest filesystem
2017-11-15 11:47:01 100-2: full sync 'SSDstorage:subvol-100-disk-1' (__replicate_100-2_1510742820__)
2017-11-15 11:47:02 100-2: full send of SSDstorage/subvol-100-disk-1@__replicate_100-1_1510742732__ estimated size is 8.01G
2017-11-15 11:47:02 100-2: send from @__replicate_100-1_1510742732__ to SSDstorage/subvol-100-disk-1@__replicate_100-2_1510742820__ estimated size is 1.41M
2017-11-15 11:47:02 100-2: total estimated size is 8.01G
2017-11-15 11:47:02 100-2: TIME SENT SNAPSHOT
2017-11-15 11:47:02 100-2: SSDstorage/subvol-100-disk-1 name SSDstorage/subvol-100-disk-1 -
2017-11-15 11:47:02 100-2: volume 'SSDstorage/subvol-100-disk-1' already exists
2017-11-15 11:47:02 100-2: warning: cannot send 'SSDstorage/subvol-100-disk-1@__replicate_100-1_1510742732__': signal received
2017-11-15 11:47:02 100-2: TIME SENT SNAPSHOT
2017-11-15 11:47:02 100-2: warning: cannot send 'SSDstorage/subvol-100-disk-1@__replicate_100-2_1510742820__': Broken pipe
2017-11-15 11:47:02 100-2: cannot send 'SSDstorage/subvol-100-disk-1': I/O error
2017-11-15 11:47:02 100-2: command 'zfs send -Rpv -- SSDstorage/subvol-100-disk-1@__replicate_100-2_1510742820__' failed: exit code 1
2017-11-15 11:47:02 100-2: delete previous replication snapshot '__replicate_100-2_1510742820__' on SSDstorage:subvol-100-disk-1
2017-11-15 11:47:02 100-2: end replication job with error: command 'set -o pipefail && pvesm export SSDstorage:subvol-100-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_100-2_1510742820__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=storage' root@192.168.100.15 -- pvesm import SSDstorage:subvol-100-disk-1 zfs - -with-snapshots 1' failed: exit code 255
-----------------------------------------------------------------------------------
Workaround ( to avoid risk of manually delete the correct volume ):
On the new slave:
- zfs rename SSDstorage/subvol-100-disk-1 SSDstorage/subvol-999-disk-1
resync
test all
delete old volume
- zfs destroy SSDstorage/subvol-999-disk-1
In Practice, every time i migrate from Master to Slave and again from Slave to Master with active replication i have this error:
2017-11-15 11:58:56 ERROR: switch replication job target failed - replication job for guest '100' to target 'local/nodo1' already exists
Last edited: