Storage Replication

Gilberto Ferreira · Nov 1, 2017

Hi

I am trying to use Storage Replication.
I have deployied two servers: nodeA and nodeB.
I actived the Storage Replication in a container VM.
First replication ran ok...
I can migrate the VM between nodeA and nodeB.
I actived HA between this two nodes! I know, this a no-recommended behavior... But don't worry! It's just a lab environment!
So when the nodeA has crash, the Vm restarted in nodeB. That is expected behavior.
But, when the nodeA has back online, I cannot migrate the VM from nodeB to nodeA.
I get this error:

task started by HA resource agent
2017-11-01 14:44:52 starting migration of CT 101 to node 'pve01' (172.16.0.10)
2017-11-01 14:44:52 found local volume 'STG1:subvol-101-disk-1' (in current VM config)
full send of ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__ estimated size is 419M
send from @__replicate_101-0_1509553286__ to ZFS-LOCAL/subvol-101-disk-1@__migration__ estimated size is 6.80M
total estimated size is 426M
TIME SENT SNAPSHOT
14:44:54 11.3M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:44:55 34.6M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:44:56 53.4M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:44:57 69.3M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:44:58 89.2M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:44:59 111M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:45:00 121M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:45:01 124M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
cannot receive new filesystem stream: checksum mismatch or incomplete stream
cannot open 'ZFS-LOCAL/subvol-101-disk-1': dataset does not exist
command 'zfs recv -F -- ZFS-LOCAL/subvol-101-disk-1' failed: exit code 1
command 'zfs send -Rpv -- ZFS-LOCAL/subvol-101-disk-1@__migration__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2017-11-01 14:45:02 ERROR: command 'set -o pipefail && pvesm export STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@172.16.0.10 -- pvesm import STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 1
2017-11-01 14:45:02 aborting phase 1 - cleanup resources
2017-11-01 14:45:02 ERROR: found stale volume copy 'STG1:subvol-101-disk-1' on node 'pve01'
2017-11-01 14:45:02 start final cleanup
2017-11-01 14:45:02 ERROR: migration aborted (duration 00:00:10): command 'set -o pipefail && pvesm export STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@172.16.0.10 -- pvesm import STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 1
TASK ERROR: migration aborted

In order to remediate this BAD behavior, I maked a script in nodeB, that destroy de zvol in nodeA, and perform a migration. The scipt below do the job:
#!/bin/bash

ssh root@pve01 zpool destroy ZFS-LOCAL -f
ssh root@pve01 zpool create ZFS-LOCAL /dev/vdb -f
ha-manager migrate ct:101 pve01

I don't know if this a bug or what!

Whatever it is, I don't have a clue in how to fix it!

Somebody can help???

Thanks a lot

wolfgang · Nov 2, 2017

Gilberto Ferreira said:
But, when the nodeA has back online, I cannot migrate the VM from nodeB to nodeA.

Yes this is known. It is not implemented yet.

Search

Search

Storage Replication

Gilberto Ferreira

Renowned Member

wolfgang

Proxmox Retired Staff

We value your privacy