[SOLVED] Live migrations failing: Error 255 (for OVFTOOL/ESXi imported VMs)

linux

New Member
Dec 14, 2020
14
3
3
Australia
G'day there,

With the help of the Proxmox community and @Dominic we were able to migrate our ESXi VMs across to PVE - thank you!
https://forum.proxmox.com/threads/migrating-vms-from-esxi-to-pve-ovftool.80655 (for anybody interested)

The migrated (ex-ESXi) VMs are now part of a 3-node PVE cluster, though being the New Year holidays there had to be trouble!

Strangely we're unable to move these imported (from ESXi) VMs to other nodes in the newly-made PVE cluster. All imported VMs are currently on node #2, as nodes #1 and #3 had to be reclaimed, reinstalled (from ESXi) and joined to the PVE cluster (had to bear zero guests to do so).

The VMs are all operational on the PVE node that they were imported to, and boot/reboot without issue.
Our problem is isolated to attempting to migrate them.

Problem we're seeing is:

2020-12-21 00:58:27 starting migration of VM 222 to node ‘pve1’ (x.y.x.y)
2020-12-21 00:58:27 found local disk ‘local-lvm:vm-222-disk-0’ (in current VM config)
2020-12-21 00:58:27 copying local disk images
2020-12-21 00:58:27 starting VM 222 on remote node ‘pve1’
2020-12-21 00:58:29 [pve1] lvcreate ‘pve/vm-222-disk-0’ error: Run `lvcreate --help’ for more information.
2020-12-21 00:58:29 ERROR: online migrate failure - remote command failed with exit code 255
2020-12-21 00:58:29 aborting phase 2 - cleanup resources
2020-12-21 00:58:29 migrate_cancel
2020-12-21 00:58:30 ERROR: migration finished with problems (duration 00:00:03)
TASK ERROR: migration problems

Error 255 & attempt to migrate to other host:

Sadly, it looks like it should be reporting a more useful error than what appears to be the final line of lvcreate's error output - "Run 'lvcreate --help' for more information". Looking through other Proxmox Forum threads, the 255 error code seems to cover a few situations so we're unclear as to exactly what's gone wrong.

The error flow above is the same if we attempt with another of the 4x imported VMs, even if we try to send them to the alternative spare host. Does that likely point to a setting/issue that has to do with the migration in from ESXi? Whether it's a setting, an incompatibility or otherwise is unclear.

Only peculiarity that we can locate:

All 4x of the imported VMs have disks attached that Proxmox seems to not know the size of. Each VM only has 1x disk, which were carried over via ovftool from ESXi. I'm not sure if that's potentially causing lvcreate on the target node/s to fail due to the disk size not being specified?

EXAMPLE - Imported from ESXi to PVE:
Hard Disk (scsi0)
- local-lvm:vm-222-disk-0

EXAMPLE - Created on PVE, never migrated:
Hard Disk (scsi0)
- local-lvm:vm-106-disk-0,backup=0,size=800G,ssd=1

Has anyone here had any experience with this? We've made some suggestions in the other thread (linked at the top of this post) about ovftool in the PVE wiki based on our experience. The ESXi/ovftool part of the page looks to have been added into the wiki quite recently.

I can add in other logs/files/etc - not overly sure where to look as log-searching for the job ID didn't give us much additional info. :)

Hopefully someone is kind enough to shed some light on this for us! Thank you so much, and Happy Holidays!

Cheers,
LinuxOz
 

Attachments

  • pveqm222cf.txt
    246 bytes · Views: 1
  • pvestorage.txt
    127 bytes · Views: 1
  • pveversion.txt
    1 KB · Views: 1
Last edited:
  • Like
Reactions: Dominic

Dominic

Proxmox Staff Member
Staff member
Mar 18, 2019
1,202
135
68
Hi,

generally it makes sense to me, that something missing from the config file would lead to the invalid lvcreate command.

I just imported from a .ovf to local-lvm with qm importovf on a 3-node cluster (but with a more recent version of PVE)
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)

and it automatically corrected the missing sizes and successfully started the migration

Code:
2021-01-04 09:41:25 starting migration of VM 110 to node 'pveA' (192.168.25.146)
2021-01-04 09:41:25 found local disk 'local-lvm:vm-110-disk-0' (in current VM config)
2021-01-04 09:41:25 found local disk 'local-lvm:vm-110-disk-1' (in current VM config)
2021-01-04 09:41:26 drive 'sata1': size of disk 'local-lvm:vm-110-disk-0' updated from 0T to 16G
2021-01-04 09:41:26 drive 'sata2': size of disk 'local-lvm:vm-110-disk-1' updated from 0T to 16G
2021-01-04 09:41:26 copying local disk images
2021-01-04 09:41:26 starting VM 110 on remote node 'pveA'
2021-01-04 09:41:27 start remote tunnel
2021-01-04 09:41:28 ssh tunnel ver 1
2021-01-04 09:41:28 starting storage migration
2021-01-04 09:41:28 sata2: start migration to nbd:unix:/run/qemu-server/110_nbd.migrate:exportname=drive-sata2
drive mirror is starting for drive-sata2
drive-sata2: transferred: 0 bytes remaining: 17179869184 bytes total: 17179869184 bytes progression: 0.00 % busy: 1 ready: 0 
drive-sata2: transferred: 198180864 bytes remaining: 16981688320 bytes total: 17179869184 bytes progression: 1.15 % busy: 1 ready: 0 
drive-sata2: transferred: 416284672 bytes remaining: 16763846656 bytes total: 17180131328 bytes progression: 2.42 % busy: 1 ready: 0 
drive-sata2: transferred: 645922816 bytes remaining: 16534208512 bytes total: 17180131328 bytes progression: 3.76 % busy: 1 ready: 0

Do the sizes of your disks get updated when you run the following?
Code:
qm rescan
Maybe it works then on PVE 6.2, too.
 
  • Like
Reactions: linux

linux

New Member
Dec 14, 2020
14
3
3
Australia
Hi Dominic,

Many thanks for writing back. :)

generally it makes sense to me, that something missing from the config file would lead to the invalid lvcreate command.

I just imported from a .ovf to local-lvm with qm importovf on a 3-node cluster (but with a more recent version of PVE) and it automatically corrected the missing sizes and successfully started the migration

That's really interesting, thanks for testing it. Perhaps it's just the version difference in PVE that led it to not realise the missing size and update it?

Do the sizes of your disks get updated when you run the following?

qm rescan

Maybe it works then on PVE 6.2, too.

All of the 4x VMs had their disk sizes updated when we ran that command. They reported updates from 0T to 200G or 250G as appropriate.

Following that, our 1st attempt to move one of the VMs has been successful initiated, and is now around 10% through. Thank you!

The PVE interface is now showing the size= attributes for each VM's SCSI disk which is great to see. Appreciate your help.

Cheers,
LinuxOz
 
  • Like
Reactions: Dominic

Dominic

Proxmox Staff Member
Staff member
Mar 18, 2019
1,202
135
68
Great to hear!

I wonder why it hasn't worked in your setup without manually running qm rescan. But unless your migration fails, we could just assume it to be working on PVE 6.3 I guess.
 

linux

New Member
Dec 14, 2020
14
3
3
Australia
I wonder why it hasn't worked in your setup without manually running qm rescan.

It's a bizarre one. The only likely difference between our migrations is that you may have run ovftool then importovf locally, while we ran the ovftool on an intermediary device (due to storage restrictions outside local-lvm), and mounted it via NFS to PVE before running importovf against them.

But unless your migration fails, we could just assume it to be working on PVE 6.3 I guess.

Odds are you're on the money - we'll be moving up to 6.3 soon enough, just don't want to use no-subscription in this interim phase.

At least we have a workaround via the rescan, though it's quite strange. As we're done with ESXi, our use-case for importovf etc is over.

I can gladly report that the 1st migration has succeeded as of moments ago! Given our other PVE experiences recently, I think we're safe!

Code:
2021-01-04 21:29:54 migration status: completed
drive-scsi0: transferred: 217947439104 bytes remaining: 0 bytes total: 217947439104 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi0 : finished
2021-01-04 21:29:55 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=xyz' root@x.y.x.y pvesr set-state 232 \''{}'\'
2021-01-04 21:29:56 stopping NBD storage migration server on target.
  Logical volume "vm-232-disk-0" successfully removed
2021-01-04 21:30:14 migration finished successfully (duration 01:12:56)
TASK OK

I've just kicked off another, though as it's moved past the initial phases and is now transferring the disk, I don't expect any further issues. :)

Thanks again for your help. We're coming close to being done with the changeover from ESXi (it's over, just need to do some more house-keeping). I have another query or two - will update an existing thread from 2020 about one, and will likely make another thread for the (hopefully) final one.

Looking forward to taking out support against the hosts this year once everything has calmed down and cashflow has picked up. Well worth it!
 
  • Like
Reactions: Dominic

Dominic

Proxmox Staff Member
Staff member
Mar 18, 2019
1,202
135
68
It's a bizarre one.
It is. Actually it should not matter where the .ovf files and disks are stored...
 
  • Like
Reactions: linux

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!