Storage Replication

Hi

I am trying to use Storage Replication.
I have deployied two servers: nodeA and nodeB.
I actived the Storage Replication in a container VM.
First replication ran ok...
I can migrate the VM between nodeA and nodeB.
I actived HA between this two nodes! I know, this a no-recommended behavior... But don't worry! It's just a lab environment!
So when the nodeA has crash, the Vm restarted in nodeB. That is expected behavior.
But, when the nodeA has back online, I cannot migrate the VM from nodeB to nodeA.
I get this error:

task started by HA resource agent
2017-11-01 14:44:52 starting migration of CT 101 to node 'pve01' (172.16.0.10)
2017-11-01 14:44:52 found local volume 'STG1:subvol-101-disk-1' (in current VM config)
full send of ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__ estimated size is 419M
send from @__replicate_101-0_1509553286__ to ZFS-LOCAL/subvol-101-disk-1@__migration__ estimated size is 6.80M
total estimated size is 426M
TIME SENT SNAPSHOT
14:44:54 11.3M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:44:55 34.6M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:44:56 53.4M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:44:57 69.3M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:44:58 89.2M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:44:59 111M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:45:00 121M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
14:45:01 124M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
cannot receive new filesystem stream: checksum mismatch or incomplete stream
cannot open 'ZFS-LOCAL/subvol-101-disk-1': dataset does not exist
command 'zfs recv -F -- ZFS-LOCAL/subvol-101-disk-1' failed: exit code 1
command 'zfs send -Rpv -- ZFS-LOCAL/subvol-101-disk-1@__migration__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2017-11-01 14:45:02 ERROR: command 'set -o pipefail && pvesm export STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@172.16.0.10 -- pvesm import STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 1
2017-11-01 14:45:02 aborting phase 1 - cleanup resources
2017-11-01 14:45:02 ERROR: found stale volume copy 'STG1:subvol-101-disk-1' on node 'pve01'
2017-11-01 14:45:02 start final cleanup
2017-11-01 14:45:02 ERROR: migration aborted (duration 00:00:10): command 'set -o pipefail && pvesm export STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@172.16.0.10 -- pvesm import STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 1
TASK ERROR: migration aborted

In order to remediate this BAD behavior, I maked a script in nodeB, that destroy de zvol in nodeA, and perform a migration. The scipt below do the job:
#!/bin/bash

ssh root@pve01 zpool destroy ZFS-LOCAL -f
ssh root@pve01 zpool create ZFS-LOCAL /dev/vdb -f
ha-manager migrate ct:101 pve01

I don't know if this a bug or what!

Whatever it is, I don't have a clue in how to fix it!

Somebody can help???

Thanks a lot
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!