[SOLVED] VM migration failed

Jul 24, 2018
15
0
41
Hi,

I finished migrating VMs from a proxmox server 4 to 5.3 and it worked fine. Now I have another one from which I want to migrate VMs but I am facing a problem.

Code:
send from @ to rpool/ROOT/subvol-125-disk-1@__migration__ estimated size is 610M
total estimated size is 610M
TIME        SENT   SNAPSHOT
cannot receive new filesystem stream: destination 'rpool/ROOT/subvol-125-disk-1' exists
must specify -F to overwrite it
zfs send/receive failed, cleaning up snapshot(s)..
could not find any snapshots to destroy; check snapshot names.
could not remove target snapshot: command 'ssh root@IP zfs destroy rpool/ROOT/subvol-125-disk-1@__migration__' failed: exit code 1

Dec 14 13:35:12 ERROR: command 'set -o pipefail && zfs send -Rpv rpool/ROOT/subvol-125-disk-1@__migration__ | ssh root@IP zfs recv rpool/ROOT/subvol-125-disk-1' failed: exit code 1
Dec 14 13:35:12 aborting phase 1 - cleanup resources
Dec 14 13:35:12 ERROR: found stale volume copy 'z1local:subvol-125-disk-1' on node 'node5'
Dec 14 13:35:12 ERROR: found stale volume copy 'z2local:subvol-125-disk-1' on node 'node5'
Dec 14 13:35:12 start final cleanup
Dec 14 13:35:12 start container on target node
Dec 14 13:35:12 # /usr/bin/ssh -o 'BatchMode=yes' root@IP pct start 125
Dec 14 13:35:13 Configuration file 'nodes/node5/lxc/125.conf' does not exist
Dec 14 13:35:13 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root@IP pct start 125' failed: exit code 255
Dec 14 13:35:13 ERROR: migration aborted (duration 00:00:31): command 'set -o pipefail && zfs send -Rpv rpool/ROOT/subvol-125-disk-1@__migration__ | ssh root@IP zfs recv rpool/ROOT/subvol-125-disk-1' failed: exit code 1
migration aborted

I am sure that volume was not present on destination server.
Any ideas what could be wrong? Thank you.
 
Hi,

I am sure that volume was not present on destination server.
the output look like the disk exists on the dest sever.
please check twice.

Code:
zfs list -rt all rpool
 
I checked that prior to migration and the disk was not present.
I deleted the disk and retried the migration, same issue. Also tried to migrate a new container with no disk present on destination server, same issue.
 
How do you do the migration?
If you use the command line please send the full command.
 
Tried both command line and web.
CLI command: pct migrate 125 node5 --restart

Not sure where to specify -F: "must specify -F to overwrite it", can't find it in man pages.

When the disk is not present on destination server, the migration process takes longer since it migrates the disk but eventually fails with that error.
If I retry the migration with disk present on destination server, it fails instantly without migrating the disk.
 
Not sure where to specify -F: "must specify -F to overwrite it", can't find it in man pages.
This is a ZFS receive option, this can't be set at the migration.

Do you have multiple mount points?
please send the output of
Code:
pct config 125
 
Just one mount point and this is the case for all my containers/VMs.
Code:
arch: amd64
cpulimit: 1
cpuunits: 1024
hostname: CT125
memory: 512
net0: name=eth0,bridge=vmbr1,hwaddr=66:32:32:62:35:32,type=veth
onboot: 1
ostype: ubuntu
rootfs: z2local:subvol-125-disk-1,size=8G
swap: 512
 
I decided to upgrade the older node to the latest proxmox version available and reboot since I had no way of migrating my workload.
Now after this process I see the same error:

Code:
2019-02-14 09:48:18 starting migration of CT 125 to node 'z8' (IP)
2019-02-14 09:48:19 found local volume 'z1local:subvol-125-disk-1' (via storage)
2019-02-14 09:48:19 found local volume 'z2local:subvol-125-disk-1' (in current VM config)
full send of rpool/ROOT/subvol-125-disk-1@__migration__ estimated size is 610M
total estimated size is 610M
TIME SENT SNAPSHOT
09:48:21 52.3M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:22 61.7M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:23 61.7M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:24 78.6M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:25 129M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:26 134M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:27 149M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:28 210M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:29 304M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:30 411M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:31 507M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:32 605M rpool/ROOT/subvol-125-disk-1@__migration__
full send of rpool/ROOT/subvol-125-disk-1@__migration__ estimated size is 610M
total estimated size is 610M
TIME SENT SNAPSHOT
rpool/ROOT/subvol-125-disk-1 name rpool/ROOT/subvol-125-disk-1 -
volume 'rpool/ROOT/subvol-125-disk-1' already exists
command 'zfs send -Rpv -- rpool/ROOT/subvol-125-disk-1@__migration__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2019-02-14 09:48:34 ERROR: command 'set -o pipefail && pvesm export z1local:subvol-125-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=z8' root@IP -- pvesm import z1local:subvol-125-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
2019-02-14 09:48:34 aborting phase 1 - cleanup resources
2019-02-14 09:48:34 ERROR: found stale volume copy 'z2local:subvol-125-disk-1' on node 'z8'
2019-02-14 09:48:34 ERROR: found stale volume copy 'z1local:subvol-125-disk-1' on node 'z8'
2019-02-14 09:48:34 start final cleanup
2019-02-14 09:48:34 ERROR: migration aborted (duration 00:00:16): command 'set -o pipefail && pvesm export z1local:subvol-125-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=z8' root@IP -- pvesm import z1local:subvol-125-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
TASK ERROR: migration aborted

The volume is created as soon as the first disk __migration__ transfer starts. The error is displayed after the transfer completes since it tries to create the disk again.
So I guess the problem is that it does not create the volume with: rpool/ROOT/subvol-125-disk-1@__migration__ but instead as the final destination like shown with zfs list: rpool/ROOT/subvol-125-disk-1 367M 7.64G 367M /rpool/ROOT/subvol-125-disk-1
I really am at a loss here and I can't figure it out. I need your support in order to get past this.

Thank you.
 
Now I see the problem.
You have 2 storages in the same subset of the pool.
So, PVE tries to migrate the same image two times.
You should not have two storage descriptions on the same subset of a pool.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!