[SOLVED] LXC migration fails on ZFS

grobs

Active Member
Apr 1, 2016
56
0
26
38
France
Hi,

I'm trying to cold-migrate an LXC container from a host to another.
The LXC I want to migrate is located on a "local-zfs" pool created on the first node and enabled on every cluster node.

Here is the migration log:
Code:
2018-08-31 10:12:54 starting migration of CT 162 to node 'proxmox5-staging-02' (192.168.10.51)
2018-08-31 10:12:54 found local volume 'local-zfs-pm502:subvol-162-disk-1' (via storage)
2018-08-31 10:12:54 found local volume 'local-zfs:subvol-162-disk-1' (in current VM config)
full send of rpool/data/subvol-162-disk-1@__migration__ estimated size is 1.21G
total estimated size is 1.21G
TIME SENT SNAPSHOT
10:12:55 96.3M rpool/data/subvol-162-disk-1@__migration__
10:12:56 207M rpool/data/subvol-162-disk-1@__migration__
10:12:57 319M rpool/data/subvol-162-disk-1@__migration__
10:12:58 430M rpool/data/subvol-162-disk-1@__migration__
10:12:59 541M rpool/data/subvol-162-disk-1@__migration__
10:13:00 653M rpool/data/subvol-162-disk-1@__migration__
10:13:01 761M rpool/data/subvol-162-disk-1@__migration__
10:13:02 872M rpool/data/subvol-162-disk-1@__migration__
10:13:03 982M rpool/data/subvol-162-disk-1@__migration__
10:13:04 1.07G rpool/data/subvol-162-disk-1@__migration__
10:13:05 1.15G rpool/data/subvol-162-disk-1@__migration__
10:13:06 1.24G rpool/data/subvol-162-disk-1@__migration__
full send of rpool/data/subvol-162-disk-1@__migration__ estimated size is 1.21G
total estimated size is 1.21G
TIME SENT SNAPSHOT
rpool/data/subvol-162-disk-1 name rpool/data/subvol-162-disk-1 -
volume 'rpool/data/subvol-162-disk-1' already exists
command 'zfs send -Rpv -- rpool/data/subvol-162-disk-1@__migration__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2018-08-31 10:13:07 ERROR: command 'set -o pipefail && pvesm export local-zfs:subvol-162-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxmox5-staging-02' root@192.168.10.51 -- pvesm import local-zfs:subvol-162-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
2018-08-31 10:13:07 aborting phase 1 - cleanup resources
2018-08-31 10:13:07 ERROR: found stale volume copy 'local-zfs-pm502:subvol-162-disk-1' on node 'proxmox5-staging-02'
2018-08-31 10:13:07 ERROR: found stale volume copy 'local-zfs:subvol-162-disk-1' on node 'proxmox5-staging-02'
2018-08-31 10:13:07 start final cleanup
2018-08-31 10:13:07 ERROR: migration aborted (duration 00:00:13): command 'set -o pipefail && pvesm export local-zfs:subvol-162-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxmox5-staging-02' root@192.168.10.51 -- pvesm import local-zfs:subvol-162-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
TASK ERROR: migration aborted

It seems that it finds the container's disk in two different ZFS pools, which is not the case (it's only in local-zfs).
Code:
2018-08-31 10:12:54 found local volume 'local-zfs-pm502:subvol-162-disk-1' (via storage)
2018-08-31 10:12:54 found local volume 'local-zfs:subvol-162-disk-1' (in current VM config)

Do you have any clue?

Regards
 

Attachments

  • Capture d'écran de 2018-08-31 10-23-05.png
    Capture d'écran de 2018-08-31 10-23-05.png
    140.3 KB · Views: 6
Hi,

you have two storages which point to the same dataset this is not working.
 
Yes, you're right, I figured it out too.
I removed the unnecessary storages, now I have only local and local-zfs on the two nodes.

Now, when I try to migrate, I obtain this error:
Code:
2018-08-31 11:30:22 starting migration of CT 162 to node 'proxmox5-staging-02' (192.168.10.51)
2018-08-31 11:30:23 found local volume 'local-zfs:subvol-162-disk-1' (in current VM config)
full send of rpool/data/subvol-162-disk-1@__migration__ estimated size is 1.22G
total estimated size is 1.22G
TIME SENT SNAPSHOT
rpool/data/subvol-162-disk-1 name rpool/data/subvol-162-disk-1 -
volume 'rpool/data/subvol-162-disk-1' already exists
command 'zfs send -Rpv -- rpool/data/subvol-162-disk-1@__migration__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2018-08-31 11:30:23 ERROR: command 'set -o pipefail && pvesm export local-zfs:subvol-162-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxmox5-staging-02' root@192.168.10.51 -- pvesm import local-zfs:subvol-162-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
2018-08-31 11:30:23 aborting phase 1 - cleanup resources
2018-08-31 11:30:23 ERROR: found stale volume copy 'local-zfs:subvol-162-disk-1' on node 'proxmox5-staging-02'
2018-08-31 11:30:23 start final cleanup
2018-08-31 11:30:23 ERROR: migration aborted (duration 00:00:01): command 'set -o pipefail && pvesm export local-zfs:subvol-162-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxmox5-staging-02' root@192.168.10.51 -- pvesm import local-zfs:subvol-162-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
TASK ERROR: migration aborted
 
If you try again it should now be cleaned up and you can migrate.
If not you must remove the image on the target side manually.
 
pvesm was still showing the disk image:
Code:
root@proxmox5-staging-02:~# pvesm list local-zfs
local-zfs:subvol-162-disk-1 subvol 11811160064 162

So I deleted the stale image:
Code:
root@proxmox5-staging-02:~# pvesm free local-zfs:subvol-162-disk-1

And then migrate task succeeded.

Thank you very much for your help wolfgang, support and community is one reason I love Proxmox!

Keep going!