Some thoughts on making live storage migration better.

mcparlandj

Well-Known Member
Mar 1, 2017
30
2
48
50
We have 4 Proxmox host servers and two different NFS servers for VM storage. We usually we manually distribute the VM storage load between them.
The two NFS storage servers are all flash and have 10 gig connections to our ProxMox hosts.
A few times a year we need to update the firmware on our NFS storage back ends. In order to not have to shut down any of the VMs our process looks like this:

Basically we have to migrate all the VMs from one NFS storage node to the other by moving the Qcow file.
So move everything off NFS1 so all the VMs are running off NFS2.
Update the OS on NFS1.
Then migrate ALL the VM qcows to NFS1.
Then update the software on NFS2.
Then by hand, manually balance the storage load on both servers by moving some of the qcows BACK to NFS2.
Then repeat this process 4-6 months later.

The process itself is fine. I can’t imagine that we are the only end user doing this sort of thing. I just wanted to share some ideas that I think could make this process a little easier on the end users.

Right now I have to click on EVERY VM and see where it’s storage is. Hi-light the disk, then click to move button, deal with the dialogs, then move on to the next VM. It’s a lot of repetitive tasks when you’re dealing with over 50 VMs.

So here’s what I’m thinking. When I click on a storage node, then on “Content” I can see a list of all the VMs on that particular storage node. The best solution would be if I could actually DO something with that information. I’d love to have a button that says“Migrate All Storage to another storage device” Maybe even radio buttons where I could check off the VMs that I want to mass move to another storage node.

This would act like how when looking at a VM host you have a Migrate All VMs” to another host button. This would be an ENORMOUS time saver for end users like myself.

If that’s not possible, another issue we run into is moving a disks the way we do now. So we can only click the “Move Disk” button on a particular host for about 3 VMs at a time. Anything more than that, I get the following error:

“create full clone of drive scsi0 (San0-KVM-Storage:127/vm-127-disk-1.qcow2)
TASK ERROR: storage migration failed: error with cfs lock 'storage-SAN1-KVM-Storage': unable to create image: got lock timeout - aborting command”

So to deal with this, I need to move onto another host and move three of IT’S VMs. Then the next host etc etc. At some point, I can go back to the first host and move 3 more VMs disks. But hopefully I remember, or have written down which VM I left off at.

It would be nice if the system would just put them in a queue. With a 10 gig connection, and all Flash storage, I’m not sure why it can only handle 3 tasks at a time, but that’s fine. It would just be nice if it would put it in a queue so I could set this stuff to move, and then go home for the night. As it sits right now, I have to babysit this process for a few hours.

Anyways, just some thoughts on how to make ProxMox even more awesome!
 
  • Like
Reactions: guletz
Hi,
as workaround, you could create a batch script, using "qm move_disk ....".

But yes, a task queue could be great in gui :)
Not sure about the storage lock, I remember to have done move disk on 5-6 vms in parallel. (But you can't move 2disk from same vm in parallel)
 
Currently the whole storage is locked if such a operation is done, that's why you get the timeout error. There are plans to make the lock more granular (per VM) to make parallel operations more possible.

For the bulk/mass move there are no explicit plans for now, but it could be relative easily scripted, something like:

Code:
#!/bin/bash

for vmid in {100..150}; do
    qm move_disk $vmid scsi0 new-storage
done

This would move the "scsi0" disk for all VMs with VMID 100 to 150 located on the node this is executed to the "new-storage".
This is a relatively simple example and could be adapted to ones needs. For batch jobs nothing beats the shell (or other scripting), IMO :)

A bit more "finished" script could be
Code:
#!/bin/bash

newstorage="$1"
shift
diskid="$1"
shift

for vmid in $*; do
    echo qm move_disk $vmid "$diskid" "$newstorage"
done

This takes the new storage, the diskid to move (e.g., scsi0, sata1, ...) and then a list of vmids, for example:
Code:
 ./mass-move new-storage scsi0 1001 2003 234
 
@ Proxmox-Team:
1+ Vote for an option in the gui.

@mcparlandj
When i read your description of the config you use, i had an idea:

Both storages have capacity for all VMs. Why dont you use glusterfs for replication? Imho you can take a storage offline without overhead of long migrations.

Please correct me if i am wrong.

Greetings from Hamburg ;)
 
Last edited:
Believe someone with years experience with glusterfs: it is dead slow because the authors/company refuses to implement it as kernel drivers.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!