Live QEMU storage motion seems to be rewriting the SOURCE disk?

xrobau · Jan 30, 2023

TL;DR: When I do a 'move storage' function in Proxmox, the SOURCE appears to be rewritten, far more aggressively than the destination, and I think that probably shouldn't happen!

Here's the (simplified) setup - Three Proxmox Nodes, two Storage Servers, one VM. The configuration of the VM, or the disk, appears to be irrelevant - discard, cache, etc, makes no difference.

Both storage servers are ZFS servers, with the zvol set to sharenfs=async,rw,crossmnt,no_subtree_check,no_root_squash and sync=disabled. The storage is set in the cluster to be NFS mounted /store1/datastore and /store2/datastore, with default NFS mount version.

When I do a live 'move storage', qemu for some reason goes NUTS and sends thousands and thousands of tiny writes to the SOURCE image (qcow2), which ends up causing IOPS starvation and the source host gets even MORE unhappy. The DESTINATION storage server is fine, and is running at basically 0% disk utilization, because the source server is at 100% utilization writing to the source qcow2 image.

Before I go digging into this further, I'm wondering if this is something that's expected? My current hypothesis is that it's something to do with the internal snapshot and dirty bitmap tracking, but I haven't looked into it too far yet - I am just wondering if I have made some fundamental error with the storage setup that is causing this.

This only happens in an ONLINE MOVE. When the VM is turned off, it runs (as expected) at pretty much wirespeed. I have been experimenting with a little VyOS VM with 32gb of storage - which *does not write to disk* - and doing a live storage move takes 2 minutes. With the VM off, less than a second.

Those numbers feel WILDLY wrong to me, which makes me think that I've done something silly, as I'm sure other people would have noticed this before now!

(Running the latest version of everything as of right now - that was the first thing I tried!)

sterzy · Jan 30, 2023

Would you mind sharing the storage config of the two nodes (/etc/pve/storage.cfg) as well as the config of the VM in question?

Thanks!

xrobau · Jan 30, 2023

sterzy said:
Would you mind sharing the storage config of the two nodes (/etc/pve/storage.cfg) as well as the config of the VM in question?

Since then I've torn down and rebuilt that cluster a few times, and I don't CURRENTLY have something that can reproduce it. I suspect that you're saying that this is not expected! There was nothing changed from defaults on the store, the vm was cpu=host,numa=1, and the disks were virtio-scsi-pci with various different cache settings, they made no difference.

I've switched the NFS storage to NFSv3, which is implemented slightly differently inside the Linux kernel when using ZFS filestores with sync=disabled enabled.

With NFSv4, (to use a SQL example), every thread is basically 'read committed'. Until things are written from the v4 thread to the ZFS buffers, the other threads cache aren't invalidated, and can't read the new data. Having sync=disabled means that threads can get out of sync (this is because NFS4 is much smarter than NFS3, and it pretty much needs to be that way!). Normally this is solved by having a NVMe ZIL, but as this is a test/dev setup, it didn't have one.

With NFSv3, and sync=disabled, as soon as the write arrives it is just handed off to the ZFS buffers, and the other threads don't have any cache to invalidate, so it just works. This subtle difference has been a known caveat in vmware land for quite a while, which is why it's one of the things I thought of to try!

Why does this cause a massive amount of writes to the SOURCE image? I have no idea. But as I appear to have worked around it by switching to NFS3. I'll try to build a test platform to duplicate this later, as it should be reasonably painless to validate, if my idea is correct!

RolandK · Mar 4, 2023

> I am just wondering if I have made some fundamental error with the storage setup that is causing this.

i guess no.

your observation looks similar to my observation i made today.

https://forum.proxmox.com/threads/w...source-disks-when-moving-virtual-disk.123639/

> Why does this cause a massive amount of writes to the SOURCE image? I

@xrobau , how do you see these writes?

are you sure this are writes to the source image?

apparently, on my system, it's very likely that it's not writes to the file itself but file metadata (atime )updates.

Search

Search

Live QEMU storage motion seems to be rewriting the SOURCE disk?

xrobau

Member

sterzy

Proxmox Staff Member

xrobau

Member

RolandK

Famous Member

We value your privacy