[SOLVED] Converting a Windows VM from VMware (NFS on NetApp) to ProxMox (Ceph) with minimal downtime

Rainerle

New Member
Jan 29, 2019
11
1
3
44
Hi,

we built a three node ProxMox cluster with Ceph as storage backend for our test VMs. They currently reside on a VMware two node cluster with an old NetApp as storage backend.

Plan is to try to migrate/convert - as preparation for a possible migration of our productive VMs - with minimal downtime. So the current setup/order is like this:
- Mount the NetApp NFS data store share on ProxMox read-write
- Create new empty VM on ProxMox using the NetApp data store for the VMs disk (using IDE0 and VMDK)
- Uninstall VMware tools, run MergeIDE.zip and shutdown the running VM on ESXi
- On the ProxMox cli move win10/win10-flat.vmdk and win10/win10.vmdk to images/108/ and in there copy win10.vmdk to the previously created vm-108-disk-0.vmdk
- Start VM on ProxMox

So the above works fine. But now I want to move the disk online to the Ceph storage backend. And that fails.

create full clone of drive scsi0 (netapp03-DS1:108/vm-108-disk-0.vmdk)
transferred: 0 bytes remaining: 10240 bytes total: 10240 bytes progression: 0.00 %
qemu-img: output file is smaller than input file
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f vmdk -O raw /mnt/pve/netapp03-DS1/images/108/vm-108-disk-0.vmdk 'zeroinit:rbd:ceph-proxmox-VMs/vm-108-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring'' failed: exit code 1

From what I understand it takes the size of the vm.vmdk to create the destination file - which is wrong since that is not the size of the win10-flat.vmdk referenced in the vm-108-disk-0.vmdk.

Is this a possible bug?
Or is there a better way?

Best regards
Rainer
 

virtRoo

New Member
Jan 27, 2019
23
4
3
Hi Rainerle,

Sounds like we're in the same boat. We've been thinking doing something similar as per the following:

https://forum.proxmox.com/threads/live-converting-vmdk-to-qcow2.51144/

Downtime could be greatly reduced with our draft plan other than uninstalling/installing guest tools, etc. However, we have not carried out this plan in the production environment yet so the reliability of this approach remains uncertain.

Speak of the migration error regarding storage move, it seems live storage migration (especially in relation to local storage) has been very flaky in KVM/QEMU in general as We've been testing this feature from time to time on different KVM solutions (including qemu-kvm-ev on CentOS 7) and often encounter inconsistent results or limitations. Maybe Proxmox staff could shed more light on this.

Try updating Proxmox to latest and see if it helps. We encountered some bizarre bugs when doing live migration with local storage in 5.3-7, such as when live-migrating a VM with one qcow2 from local storage to another host without specifying '--targetstorage', the VM would end up spawning some additional disks (sometimes in qcow2, sometimes in raw) with 0 size on the destination host after migration and then went corrupted. After updating to 5.3-8, the issue with spawning new disks randomly seems to have gone away, but qcow2 still gets converted to raw on the fly as per (https://bugzilla.proxmox.com/show_bug.cgi?id=2070).
 

Rainerle

New Member
Jan 29, 2019
11
1
3
44
Hi virtRoo,

after starting this thread yesterday I upgraded to 5.3-8. I will try another VM today...

I understand that I am moving from one virtualisation platform to another one so I have no problem with a short downtime. But most of my live VMs are rather big and I need some best practise solution to reduce the endusers downtime...
 

Rainerle

New Member
Jan 29, 2019
11
1
3
44
Hi,

tried to move a disk just now with the current ProxMox (5.3-8).

Online:
Virtual Environment 5.3-8
Virtual Machine 111 (win2008r2) on node 'proxmox01'
Logs
create full clone of drive ide0 (netapp03-DS1:111/vm-111-disk-0.vmdk)
drive mirror is starting for drive-ide0
drive-ide0: transferred: 0 bytes remaining: 268435456000 bytes total: 268435456000 bytes progression: 0.00 % busy: 1 ready: 0
drive-ide0: Cancelling block job
drive-ide0: Done.
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: mirroring error: drive-ide0: mirroring has been cancelled

Offline:
Virtual Environment 5.3-8
Virtual Machine 111 (win2008r2) on node 'proxmox01'
Logs
create full clone of drive ide0 (netapp03-DS1:111/vm-111-disk-0.vmdk)
transferred: 0 bytes remaining: 10240 bytes total: 10240 bytes progression: 0.00 %
qemu-img: output file is smaller than input file
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f vmdk -O raw /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk 'zeroinit:rbd:ceph-proxmox-VMs/vm-111-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring'' failed: exit code 1

So that problem still exists.

Or is it just the wrong way to do it?
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,124
272
88
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f vmdk -O raw /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk 'zeroinit:rbd:ceph-proxmox-VMs/vm-111-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring'' failed: exit code 1
Does the convert work, when using the same storage (eg. DS1) as storage?
Did you also try the convert by hand?
Does the PVE node have access to the ceph cluster?
 

Rainerle

New Member
Jan 29, 2019
11
1
3
44
Hi,

thanks for getting back so quickly!

Converting onto same storage:
Virtual Environment 5.3-8
Virtual Machine 111 (win2008r2) on node 'proxmox01'
Logs
scsi0
create full clone of drive scsi0 (netapp03-DS1:111/vm-111-disk-0.vmdk)
Formatting '/mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.qcow2', fmt=qcow2 size=10240 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
transferred: 0 bytes remaining: 10240 bytes total: 10240 bytes progression: 0.00 %
qemu-img: output file is smaller than input file
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f vmdk -O qcow2 /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk zeroinit:/mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.qcow2' failed: exit code 1

Converting by hand does work:
root@proxmox01:/var/lib/vz/images# /usr/bin/qemu-img convert -p -f vmdk -O qcow2 /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.qcow2
(100.00/100%)
root@proxmox01:/var/lib/vz/images#

If pveceph status would have a short output I would have posted that... but yes (Quorate: Yes and HEALTH_OK for Ceph on all three nodes).
 

Rainerle

New Member
Jan 29, 2019
11
1
3
44
Hi,

moving it offline to the final destination also works

root@proxmox01:/var/lib/vz/images# /usr/bin/qemu-img convert -p -f vmdk -O raw /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk rbd:ceph-proxmox-VMs/vm-111-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring
(4.00/100%)
...
 

Rainerle

New Member
Jan 29, 2019
11
1
3
44
Seems like the issue with the not working offline copy is the creation of the destination raw file with the wrong size.

From the gui:
Virtual Environment 5.3-8
Storage 'ceph-proxmox-VMs' on node 'proxmox01'
Search:
Logs

()
create full clone of drive scsi0 (netapp03-DS1:111/vm-111-disk-0.vmdk)
transferred: 0 bytes remaining: 10240 bytes total: 10240 bytes progression: 0.00 %
qemu-img: output file is smaller than input file
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f vmdk -O raw /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk 'zeroinit:rbd:ceph-proxmox-VMs/vm-111-disk-1:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring'' failed: exit code 1

On the commandline using a manually created file in the correct size
/usr/bin/qemu-img convert -p -n -f vmdk -O raw /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk zeroinit:rbd:ceph-proxmox-VMs/vm-111-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring

Just works.

So looking at the directory on the NetApp
root@proxmox01:/mnt/pve/netapp03-DS1/images/111# ls -al
total 521524892
drwxr----- 2 root root 4096 Jan 30 14:34 .
drwxr-xr-x 10 root root 4096 Jan 30 10:07 ..
-rw------- 1 root root 268435456000 Jan 30 11:27 win2008r2_3-flat.vmdk
-rw------- 1 root root 649 Jan 30 10:11 win2008r2_3.vmdk
-rw------- 1 root root 268435456000 Jan 30 11:27 win2008r2_4-flat.vmdk
-rw------- 1 root root 649 Jan 30 10:11 win2008r2_4.vmdk
-rw-r----- 1 root root 10240 Jan 30 10:58 vm-111-disk-0.vmdk
-rw-r----- 1 root root 10240 Jan 30 11:25 vm-111-disk-1.vmdk
root@proxmox01:/mnt/pve/netapp03-DS1/images/111#


The qm move_disk creates the destination file from the size of the vm-111-disk-0.vmdk.

This is wrong. It should be the referenced win2008r2_3-flat.vmdk.
 
Last edited:

udo

Famous Member
Apr 22, 2009
5,869
162
83
Ahrensburg; Germany
So looking at the directory on the NetApp
root@proxmox01:/mnt/pve/netapp03-DS1/images/111# ls -al
total 521524892
drwxr----- 2 root root 4096 Jan 30 14:34 .
drwxr-xr-x 10 root root 4096 Jan 30 10:07 ..
-rw------- 1 root root 268435456000 Jan 30 11:27 win2008r2_3-flat.vmdk
-rw------- 1 root root 649 Jan 30 10:11 win2008r2_3.vmdk
-rw------- 1 root root 268435456000 Jan 30 11:27 win2008r2_4-flat.vmdk
-rw------- 1 root root 649 Jan 30 10:11 win2008r2_4.vmdk
-rw-r----- 1 root root 10240 Jan 30 10:58 vm-111-disk-0.vmdk
-rw-r----- 1 root root 10240 Jan 30 11:25 vm-111-disk-1.vmdk
root@proxmox01:/mnt/pve/netapp03-DS1/images/111#

The qm move_disk creates the destination file from the size of the vm-111-disk-0.vmdk.

This is wrong. It should be the referenced win2008r2_3-flat.vmdk.
Hi,
long time ago that I migrate from vmware to proxmox...

AFAIK the *-flat.vmdk are raw-files.
What happens, if you create a VM with raw-disks and copy then the flat to the vmdisks?
Like "cp win2008r2_3-flat.vmdk vm-111-disk-0.raw"
After that, I think an onlinemigration will work.

Udo
 

Rainerle

New Member
Jan 29, 2019
11
1
3
44
Hi Udo,

cool suggestion! Tried it...

So the filesystem is actually accessible within the VM - but the disk move borks now out with a different error:

Online Migration:
create full clone of drive scsi1 (netapp03-DS1:111/vm-111-disk-0.raw)
drive mirror is starting for drive-scsi1
drive-scsi1: transferred: 0 bytes remaining: 268435456000 bytes total: 268435456000 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi1: Cancelling block job
drive-scsi1: Done.
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: mirroring error: drive-scsi1: mirroring has been cancelled
 

udo

Famous Member
Apr 22, 2009
5,869
162
83
Ahrensburg; Germany
Hi Udo,

cool suggestion! Tried it...

So the filesystem is actually accessible within the VM - but the disk move borks now out with a different error:

Online Migration:
create full clone of drive scsi1 (netapp03-DS1:111/vm-111-disk-0.raw)
drive mirror is starting for drive-scsi1
drive-scsi1: transferred: 0 bytes remaining: 268435456000 bytes total: 268435456000 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi1: Cancelling block job
drive-scsi1: Done.
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: mirroring error: drive-scsi1: mirroring has been cancelled
Hi,
perhaps an issue with the target? Have you run anything on the ceph allready? Does an live-migrate of an fresh VM to ceph-storage work?

Udo
 

Rainerle

New Member
Jan 29, 2019
11
1
3
44
By setting the Cache on the Harddisk from "Default (no cache)" to "Write through" the live migration works now... WTF?!?!

So, cool!!! Thanks Udo for the excellent hint!

I will try now yet another VM to check if I can reproduce the minimized downtime approach.
 

Rainerle

New Member
Jan 29, 2019
11
1
3
44
So the next conversion with another VM just went smoothly with only two reboots.

Steps:
  1. Create new VM on ProxMox and create on the NetApp NFS storage IDE raw disks as on the to be converted machine and one small SCSI qcow2 Disk to be able to install VirtIO-SCSI properly later. Use Writeback for Cache (required for online disk move). Enable guest agent and use VirtIO Ethernet as well.
  2. Adjust DHCP service to the new MAC address of the VirtIO adapter
  3. Logon to Windows VM running on VMware and install qemu-guest-agent from https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/archive-qemu-ga , run MergeIDE.zip and uninstall VMware Tools, then shutdown
  4. Move on the NetApp NFS storage the flat.vmdk files over the in step 1 created IDE raw files.
  5. Start the VM on Proxmox - this might take a while since the OS has to fix itself for booting with IDE
  6. Install VirtIO drivers for the unknown devices in Computer management
  7. Shutdown VM on ProxMox
  8. Modify /etc/pve/qemu-server/<VMID>.cfg: Replace ideX with scsiX for the to be converted disks, replace the boot order ide0 with scsi0 and remove the small SCSI drive created in step 1
  9. Startup VM on ProxMox - it should boot from the VirtIO SCSI now
  10. Move the disks from the NetApp NFS server to the Ceph storage backend - one by one - online
 
  • Like
Reactions: Toni G

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!