Storage Replication

Discussion in 'Proxmox VE: Installation and configuration' started by Gilberto Ferreira, Nov 1, 2017.

  1. Gilberto Ferreira

    Aug 21, 2014
    Likes Received:

    I am trying to use Storage Replication.
    I have deployied two servers: nodeA and nodeB.
    I actived the Storage Replication in a container VM.
    First replication ran ok...
    I can migrate the VM between nodeA and nodeB.
    I actived HA between this two nodes! I know, this a no-recommended behavior... But don't worry! It's just a lab environment!
    So when the nodeA has crash, the Vm restarted in nodeB. That is expected behavior.
    But, when the nodeA has back online, I cannot migrate the VM from nodeB to nodeA.
    I get this error:

    task started by HA resource agent
    2017-11-01 14:44:52 starting migration of CT 101 to node 'pve01' (
    2017-11-01 14:44:52 found local volume 'STG1:subvol-101-disk-1' (in current VM config)
    full send of ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__ estimated size is 419M
    send from @__replicate_101-0_1509553286__ to ZFS-LOCAL/subvol-101-disk-1@__migration__ estimated size is 6.80M
    total estimated size is 426M
    14:44:54 11.3M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
    14:44:55 34.6M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
    14:44:56 53.4M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
    14:44:57 69.3M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
    14:44:58 89.2M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
    14:44:59 111M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
    14:45:00 121M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
    14:45:01 124M ZFS-LOCAL/subvol-101-disk-1@__replicate_101-0_1509553286__
    cannot receive new filesystem stream: checksum mismatch or incomplete stream
    cannot open 'ZFS-LOCAL/subvol-101-disk-1': dataset does not exist
    command 'zfs recv -F -- ZFS-LOCAL/subvol-101-disk-1' failed: exit code 1
    command 'zfs send -Rpv -- ZFS-LOCAL/subvol-101-disk-1@__migration__' failed: got signal 13
    send/receive failed, cleaning up snapshot(s)..
    2017-11-01 14:45:02 ERROR: command 'set -o pipefail && pvesm export STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@ -- pvesm import STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 1
    2017-11-01 14:45:02 aborting phase 1 - cleanup resources
    2017-11-01 14:45:02 ERROR: found stale volume copy 'STG1:subvol-101-disk-1' on node 'pve01'
    2017-11-01 14:45:02 start final cleanup
    2017-11-01 14:45:02 ERROR: migration aborted (duration 00:00:10): command 'set -o pipefail && pvesm export STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=pve01' root@ -- pvesm import STG1:subvol-101-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 1
    TASK ERROR: migration aborted

    In order to remediate this BAD behavior, I maked a script in nodeB, that destroy de zvol in nodeA, and perform a migration. The scipt below do the job:

    ssh root@pve01 zpool destroy ZFS-LOCAL -f
    ssh root@pve01 zpool create ZFS-LOCAL /dev/vdb -f
    ha-manager migrate ct:101 pve01

    I don't know if this a bug or what!

    Whatever it is, I don't have a clue in how to fix it!

    Somebody can help???

    Thanks a lot
    #1 Gilberto Ferreira, Nov 1, 2017
    Last edited: Nov 2, 2017
  2. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Oct 1, 2014
    Likes Received:
    Yes this is known. It is not implemented yet.
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice