[SOLVED] VM migration failed

Claudiu Popescu · Dec 14, 2018

Hi,

I finished migrating VMs from a proxmox server 4 to 5.3 and it worked fine. Now I have another one from which I want to migrate VMs but I am facing a problem.

Code:

send from @ to rpool/ROOT/subvol-125-disk-1@__migration__ estimated size is 610M
total estimated size is 610M
TIME        SENT   SNAPSHOT
cannot receive new filesystem stream: destination 'rpool/ROOT/subvol-125-disk-1' exists
must specify -F to overwrite it
zfs send/receive failed, cleaning up snapshot(s)..
could not find any snapshots to destroy; check snapshot names.
could not remove target snapshot: command 'ssh root@IP zfs destroy rpool/ROOT/subvol-125-disk-1@__migration__' failed: exit code 1

Dec 14 13:35:12 ERROR: command 'set -o pipefail && zfs send -Rpv rpool/ROOT/subvol-125-disk-1@__migration__ | ssh root@IP zfs recv rpool/ROOT/subvol-125-disk-1' failed: exit code 1
Dec 14 13:35:12 aborting phase 1 - cleanup resources
Dec 14 13:35:12 ERROR: found stale volume copy 'z1local:subvol-125-disk-1' on node 'node5'
Dec 14 13:35:12 ERROR: found stale volume copy 'z2local:subvol-125-disk-1' on node 'node5'
Dec 14 13:35:12 start final cleanup
Dec 14 13:35:12 start container on target node
Dec 14 13:35:12 # /usr/bin/ssh -o 'BatchMode=yes' root@IP pct start 125
Dec 14 13:35:13 Configuration file 'nodes/node5/lxc/125.conf' does not exist
Dec 14 13:35:13 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root@IP pct start 125' failed: exit code 255
Dec 14 13:35:13 ERROR: migration aborted (duration 00:00:31): command 'set -o pipefail && zfs send -Rpv rpool/ROOT/subvol-125-disk-1@__migration__ | ssh root@IP zfs recv rpool/ROOT/subvol-125-disk-1' failed: exit code 1
migration aborted

I am sure that volume was not present on destination server.
Any ideas what could be wrong? Thank you.

wolfgang · Dec 17, 2018

Hi,

Claudiu Popescu said:
I am sure that volume was not present on destination server.

the output look like the disk exists on the dest sever.
please check twice.

Code:

zfs list -rt all rpool

Claudiu Popescu · Dec 17, 2018

I checked that prior to migration and the disk was not present.
I deleted the disk and retried the migration, same issue. Also tried to migrate a new container with no disk present on destination server, same issue.

wolfgang · Dec 17, 2018

How do you do the migration?
If you use the command line please send the full command.

Claudiu Popescu · Dec 17, 2018

Tried both command line and web.
CLI command: pct migrate 125 node5 --restart

Not sure where to specify -F: "must specify -F to overwrite it", can't find it in man pages.

When the disk is not present on destination server, the migration process takes longer since it migrates the disk but eventually fails with that error.
If I retry the migration with disk present on destination server, it fails instantly without migrating the disk.

wolfgang · Dec 18, 2018

Claudiu Popescu said:
Not sure where to specify -F: "must specify -F to overwrite it", can't find it in man pages.

This is a ZFS receive option, this can't be set at the migration.

Do you have multiple mount points?
please send the output of

Code:

pct config 125

Claudiu Popescu · Dec 18, 2018

Just one mount point and this is the case for all my containers/VMs.

Code:

arch: amd64
cpulimit: 1
cpuunits: 1024
hostname: CT125
memory: 512
net0: name=eth0,bridge=vmbr1,hwaddr=66:32:32:62:35:32,type=veth
onboot: 1
ostype: ubuntu
rootfs: z2local:subvol-125-disk-1,size=8G
swap: 512

Claudiu Popescu · Feb 14, 2019

I decided to upgrade the older node to the latest proxmox version available and reboot since I had no way of migrating my workload.
Now after this process I see the same error:

Code:

2019-02-14 09:48:18 starting migration of CT 125 to node 'z8' (IP)
2019-02-14 09:48:19 found local volume 'z1local:subvol-125-disk-1' (via storage)
2019-02-14 09:48:19 found local volume 'z2local:subvol-125-disk-1' (in current VM config)
full send of rpool/ROOT/subvol-125-disk-1@__migration__ estimated size is 610M
total estimated size is 610M
TIME SENT SNAPSHOT
09:48:21 52.3M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:22 61.7M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:23 61.7M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:24 78.6M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:25 129M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:26 134M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:27 149M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:28 210M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:29 304M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:30 411M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:31 507M rpool/ROOT/subvol-125-disk-1@__migration__
09:48:32 605M rpool/ROOT/subvol-125-disk-1@__migration__
full send of rpool/ROOT/subvol-125-disk-1@__migration__ estimated size is 610M
total estimated size is 610M
TIME SENT SNAPSHOT
rpool/ROOT/subvol-125-disk-1 name rpool/ROOT/subvol-125-disk-1 -
volume 'rpool/ROOT/subvol-125-disk-1' already exists
command 'zfs send -Rpv -- rpool/ROOT/subvol-125-disk-1@__migration__' failed: got signal 13
send/receive failed, cleaning up snapshot(s)..
2019-02-14 09:48:34 ERROR: command 'set -o pipefail && pvesm export z1local:subvol-125-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=z8' root@IP -- pvesm import z1local:subvol-125-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
2019-02-14 09:48:34 aborting phase 1 - cleanup resources
2019-02-14 09:48:34 ERROR: found stale volume copy 'z2local:subvol-125-disk-1' on node 'z8'
2019-02-14 09:48:34 ERROR: found stale volume copy 'z1local:subvol-125-disk-1' on node 'z8'
2019-02-14 09:48:34 start final cleanup
2019-02-14 09:48:34 ERROR: migration aborted (duration 00:00:16): command 'set -o pipefail && pvesm export z1local:subvol-125-disk-1 zfs - -with-snapshots 0 -snapshot __migration__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=z8' root@IP -- pvesm import z1local:subvol-125-disk-1 zfs - -with-snapshots 0 -delete-snapshot __migration__' failed: exit code 255
TASK ERROR: migration aborted

The volume is created as soon as the first disk __migration__ transfer starts. The error is displayed after the transfer completes since it tries to create the disk again.
So I guess the problem is that it does not create the volume with: rpool/ROOT/subvol-125-disk-1@__migration__ but instead as the final destination like shown with zfs list: rpool/ROOT/subvol-125-disk-1 367M 7.64G 367M /rpool/ROOT/subvol-125-disk-1
I really am at a loss here and I can't figure it out. I need your support in order to get past this.

Thank you.

wolfgang · Feb 14, 2019

Now I see the problem.
You have 2 storages in the same subset of the pool.
So, PVE tries to migrate the same image two times.
You should not have two storage descriptions on the same subset of a pool.

Claudiu Popescu · Feb 14, 2019

Ok, this is solved now, thank you.

Search

Search

[SOLVED] VM migration failed

Claudiu Popescu

Active Member

wolfgang

Proxmox Retired Staff

Claudiu Popescu

Active Member

wolfgang

Proxmox Retired Staff

Claudiu Popescu

Active Member

wolfgang

Proxmox Retired Staff

Claudiu Popescu

Active Member

Claudiu Popescu

Active Member

wolfgang

Proxmox Retired Staff

Claudiu Popescu

Active Member