[SOLVED] Converting a Windows VM from VMware (NFS on NetApp) to ProxMox (Ceph) with minimal downtime

Discussion in 'Proxmox VE: Installation and configuration' started by Rainerle, Jan 29, 2019.

  1. Rainerle

    Rainerle New Member

    Joined:
    Jan 29, 2019
    Messages:
    11
    Likes Received:
    1
    Hi,

    we built a three node ProxMox cluster with Ceph as storage backend for our test VMs. They currently reside on a VMware two node cluster with an old NetApp as storage backend.

    Plan is to try to migrate/convert - as preparation for a possible migration of our productive VMs - with minimal downtime. So the current setup/order is like this:
    - Mount the NetApp NFS data store share on ProxMox read-write
    - Create new empty VM on ProxMox using the NetApp data store for the VMs disk (using IDE0 and VMDK)
    - Uninstall VMware tools, run MergeIDE.zip and shutdown the running VM on ESXi
    - On the ProxMox cli move win10/win10-flat.vmdk and win10/win10.vmdk to images/108/ and in there copy win10.vmdk to the previously created vm-108-disk-0.vmdk
    - Start VM on ProxMox

    So the above works fine. But now I want to move the disk online to the Ceph storage backend. And that fails.

    create full clone of drive scsi0 (netapp03-DS1:108/vm-108-disk-0.vmdk)
    transferred: 0 bytes remaining: 10240 bytes total: 10240 bytes progression: 0.00 %
    qemu-img: output file is smaller than input file
    Removing image: 100% complete...done.
    TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f vmdk -O raw /mnt/pve/netapp03-DS1/images/108/vm-108-disk-0.vmdk 'zeroinit:rbd:ceph-proxmox-VMs/vm-108-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring'' failed: exit code 1

    From what I understand it takes the size of the vm.vmdk to create the destination file - which is wrong since that is not the size of the win10-flat.vmdk referenced in the vm-108-disk-0.vmdk.

    Is this a possible bug?
    Or is there a better way?

    Best regards
    Rainer
     
  2. virtRoo

    virtRoo New Member

    Joined:
    Jan 27, 2019
    Messages:
    23
    Likes Received:
    4
    Hi Rainerle,

    Sounds like we're in the same boat. We've been thinking doing something similar as per the following:

    https://forum.proxmox.com/threads/live-converting-vmdk-to-qcow2.51144/

    Downtime could be greatly reduced with our draft plan other than uninstalling/installing guest tools, etc. However, we have not carried out this plan in the production environment yet so the reliability of this approach remains uncertain.

    Speak of the migration error regarding storage move, it seems live storage migration (especially in relation to local storage) has been very flaky in KVM/QEMU in general as We've been testing this feature from time to time on different KVM solutions (including qemu-kvm-ev on CentOS 7) and often encounter inconsistent results or limitations. Maybe Proxmox staff could shed more light on this.

    Try updating Proxmox to latest and see if it helps. We encountered some bizarre bugs when doing live migration with local storage in 5.3-7, such as when live-migrating a VM with one qcow2 from local storage to another host without specifying '--targetstorage', the VM would end up spawning some additional disks (sometimes in qcow2, sometimes in raw) with 0 size on the destination host after migration and then went corrupted. After updating to 5.3-8, the issue with spawning new disks randomly seems to have gone away, but qcow2 still gets converted to raw on the fly as per (https://bugzilla.proxmox.com/show_bug.cgi?id=2070).
     
  3. Rainerle

    Rainerle New Member

    Joined:
    Jan 29, 2019
    Messages:
    11
    Likes Received:
    1
    Hi virtRoo,

    after starting this thread yesterday I upgraded to 5.3-8. I will try another VM today...

    I understand that I am moving from one virtualisation platform to another one so I have no problem with a short downtime. But most of my live VMs are rather big and I need some best practise solution to reduce the endusers downtime...
     
  4. Rainerle

    Rainerle New Member

    Joined:
    Jan 29, 2019
    Messages:
    11
    Likes Received:
    1
    Hi,

    tried to move a disk just now with the current ProxMox (5.3-8).

    Online:
    Virtual Environment 5.3-8
    Virtual Machine 111 (win2008r2) on node 'proxmox01'
    Logs
    create full clone of drive ide0 (netapp03-DS1:111/vm-111-disk-0.vmdk)
    drive mirror is starting for drive-ide0
    drive-ide0: transferred: 0 bytes remaining: 268435456000 bytes total: 268435456000 bytes progression: 0.00 % busy: 1 ready: 0
    drive-ide0: Cancelling block job
    drive-ide0: Done.
    Removing image: 100% complete...done.
    TASK ERROR: storage migration failed: mirroring error: drive-ide0: mirroring has been cancelled

    Offline:
    Virtual Environment 5.3-8
    Virtual Machine 111 (win2008r2) on node 'proxmox01'
    Logs
    create full clone of drive ide0 (netapp03-DS1:111/vm-111-disk-0.vmdk)
    transferred: 0 bytes remaining: 10240 bytes total: 10240 bytes progression: 0.00 %
    qemu-img: output file is smaller than input file
    Removing image: 100% complete...done.
    TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f vmdk -O raw /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk 'zeroinit:rbd:ceph-proxmox-VMs/vm-111-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring'' failed: exit code 1

    So that problem still exists.

    Or is it just the wrong way to do it?
     
  5. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,348
    Likes Received:
    213
    Does the convert work, when using the same storage (eg. DS1) as storage?
    Did you also try the convert by hand?
    Does the PVE node have access to the ceph cluster?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  6. Rainerle

    Rainerle New Member

    Joined:
    Jan 29, 2019
    Messages:
    11
    Likes Received:
    1
    Hi,

    thanks for getting back so quickly!

    Converting onto same storage:
    Virtual Environment 5.3-8
    Virtual Machine 111 (win2008r2) on node 'proxmox01'
    Logs
    scsi0
    create full clone of drive scsi0 (netapp03-DS1:111/vm-111-disk-0.vmdk)
    Formatting '/mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.qcow2', fmt=qcow2 size=10240 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
    transferred: 0 bytes remaining: 10240 bytes total: 10240 bytes progression: 0.00 %
    qemu-img: output file is smaller than input file
    TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f vmdk -O qcow2 /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk zeroinit:/mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.qcow2' failed: exit code 1

    Converting by hand does work:
    root@proxmox01:/var/lib/vz/images# /usr/bin/qemu-img convert -p -f vmdk -O qcow2 /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.qcow2
    (100.00/100%)
    root@proxmox01:/var/lib/vz/images#

    If pveceph status would have a short output I would have posted that... but yes (Quorate: Yes and HEALTH_OK for Ceph on all three nodes).
     
  7. Rainerle

    Rainerle New Member

    Joined:
    Jan 29, 2019
    Messages:
    11
    Likes Received:
    1
    Hi,

    moving it offline to the final destination also works

    root@proxmox01:/var/lib/vz/images# /usr/bin/qemu-img convert -p -f vmdk -O raw /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk rbd:ceph-proxmox-VMs/vm-111-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring
    (4.00/100%)
    ...
     
  8. Rainerle

    Rainerle New Member

    Joined:
    Jan 29, 2019
    Messages:
    11
    Likes Received:
    1
    Seems like the issue with the not working offline copy is the creation of the destination raw file with the wrong size.

    From the gui:
    Virtual Environment 5.3-8
    Storage 'ceph-proxmox-VMs' on node 'proxmox01'
    Search:
    Logs

    ()
    create full clone of drive scsi0 (netapp03-DS1:111/vm-111-disk-0.vmdk)
    transferred: 0 bytes remaining: 10240 bytes total: 10240 bytes progression: 0.00 %
    qemu-img: output file is smaller than input file
    Removing image: 100% complete...done.
    TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f vmdk -O raw /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk 'zeroinit:rbd:ceph-proxmox-VMs/vm-111-disk-1:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring'' failed: exit code 1

    On the commandline using a manually created file in the correct size
    /usr/bin/qemu-img convert -p -n -f vmdk -O raw /mnt/pve/netapp03-DS1/images/111/vm-111-disk-0.vmdk zeroinit:rbd:ceph-proxmox-VMs/vm-111-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/ceph-proxmox-VMs.keyring

    Just works.

    So looking at the directory on the NetApp
    root@proxmox01:/mnt/pve/netapp03-DS1/images/111# ls -al
    total 521524892
    drwxr----- 2 root root 4096 Jan 30 14:34 .
    drwxr-xr-x 10 root root 4096 Jan 30 10:07 ..
    -rw------- 1 root root 268435456000 Jan 30 11:27 win2008r2_3-flat.vmdk
    -rw------- 1 root root 649 Jan 30 10:11 win2008r2_3.vmdk
    -rw------- 1 root root 268435456000 Jan 30 11:27 win2008r2_4-flat.vmdk
    -rw------- 1 root root 649 Jan 30 10:11 win2008r2_4.vmdk
    -rw-r----- 1 root root 10240 Jan 30 10:58 vm-111-disk-0.vmdk
    -rw-r----- 1 root root 10240 Jan 30 11:25 vm-111-disk-1.vmdk
    root@proxmox01:/mnt/pve/netapp03-DS1/images/111#


    The qm move_disk creates the destination file from the size of the vm-111-disk-0.vmdk.

    This is wrong. It should be the referenced win2008r2_3-flat.vmdk.
     
    #8 Rainerle, Jan 30, 2019
    Last edited: Jan 30, 2019
  9. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,835
    Likes Received:
    159
    Hi,
    long time ago that I migrate from vmware to proxmox...

    AFAIK the *-flat.vmdk are raw-files.
    What happens, if you create a VM with raw-disks and copy then the flat to the vmdisks?
    Like "cp win2008r2_3-flat.vmdk vm-111-disk-0.raw"
    After that, I think an onlinemigration will work.

    Udo
     
    Rainerle and stumbaumr like this.
  10. Rainerle

    Rainerle New Member

    Joined:
    Jan 29, 2019
    Messages:
    11
    Likes Received:
    1
    Hi Udo,

    cool suggestion! Tried it...

    So the filesystem is actually accessible within the VM - but the disk move borks now out with a different error:

    Online Migration:
    create full clone of drive scsi1 (netapp03-DS1:111/vm-111-disk-0.raw)
    drive mirror is starting for drive-scsi1
    drive-scsi1: transferred: 0 bytes remaining: 268435456000 bytes total: 268435456000 bytes progression: 0.00 % busy: 1 ready: 0
    drive-scsi1: Cancelling block job
    drive-scsi1: Done.
    Removing image: 100% complete...done.
    TASK ERROR: storage migration failed: mirroring error: drive-scsi1: mirroring has been cancelled
     
  11. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,835
    Likes Received:
    159
    Hi,
    perhaps an issue with the target? Have you run anything on the ceph allready? Does an live-migrate of an fresh VM to ceph-storage work?

    Udo
     
    Rainerle and stumbaumr like this.
  12. Rainerle

    Rainerle New Member

    Joined:
    Jan 29, 2019
    Messages:
    11
    Likes Received:
    1
    By setting the Cache on the Harddisk from "Default (no cache)" to "Write through" the live migration works now... WTF?!?!

    So, cool!!! Thanks Udo for the excellent hint!

    I will try now yet another VM to check if I can reproduce the minimized downtime approach.
     
  13. Rainerle

    Rainerle New Member

    Joined:
    Jan 29, 2019
    Messages:
    11
    Likes Received:
    1
    So the next conversion with another VM just went smoothly with only two reboots.

    Steps:
    1. Create new VM on ProxMox and create on the NetApp NFS storage IDE raw disks as on the to be converted machine and one small SCSI qcow2 Disk to be able to install VirtIO-SCSI properly later. Use Writeback for Cache (required for online disk move). Enable guest agent and use VirtIO Ethernet as well.
    2. Adjust DHCP service to the new MAC address of the VirtIO adapter
    3. Logon to Windows VM running on VMware and install qemu-guest-agent from https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/archive-qemu-ga , run MergeIDE.zip and uninstall VMware Tools, then shutdown
    4. Move on the NetApp NFS storage the flat.vmdk files over the in step 1 created IDE raw files.
    5. Start the VM on Proxmox - this might take a while since the OS has to fix itself for booting with IDE
    6. Install VirtIO drivers for the unknown devices in Computer management
    7. Shutdown VM on ProxMox
    8. Modify /etc/pve/qemu-server/<VMID>.cfg: Replace ideX with scsiX for the to be converted disks, replace the boot order ide0 with scsi0 and remove the small SCSI drive created in step 1
    9. Startup VM on ProxMox - it should boot from the VirtIO SCSI now
    10. Move the disks from the NetApp NFS server to the Ceph storage backend - one by one - online
     
    Toni G likes this.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice