severe performance regression on virtual disk migration for qcow2 on ZFS with 5.15.39-2-pve

thanks for the heads up! :)
 
i have dug into this some more and found that the performance-problem goes away when setting "relatime=on" or "atime=off" for the dataset.
[...]
ADDON (after finding out more , see below) :
the performance problem also goes away when setting preallocation-policy to "off" on the datastore.
Many many many thanks for that one !
I was becoming mad since a week...
I made a mistake last week as I made two different upgrades in the same time: I upgraded PVE 7 to 8 and TrueNAS 12 to 13. Everything went ok (except lost of connectivity with ifupdown2, perhaps I forgot to read some release notes, my bad).
But during the night backup job it was hell: at the morning some VMs were stuck with IO from 9600k modem time: for example a VM took 25 minutes to backup on PVE7 and the same VM, the day after on PVE8 took 1h48 to backup !
I have two PVE clusters and 3 TrueNAS sharing NFS for qcow2 files and it was really a mistery...
Since 2 days I tried many things without success: mitigations=off, iothread=1, virtio scsi single, even mtu tweaking or crossing mount between the 2 clusters and the 3 TrueNAS: but none of tests gave logical results.
iperf3 give max bandwidth, with dd on PVE host or inside a VM bandwidth is on the top, but moving a disk (offline) never ends.
I was moving a 64 Gb disk while finding your post: during 5 hours it moved 33%, I went on the TrueNAS, put atime=off and moved the 67% remaining in a few minutes !
I added the option preallocation off on all my nfs datastores and launched a backup which seems to go many faster than previously.
I am not sure that all regressions are gone but is is far better for the moment... I'll check carefully during next days.
 
what zfs version do your truenas have?
It is TrueNAS 13.0-U6.1:
zfs-2.1.14-1
zfs-kmod-v2023120100-zfs_f4871096b
But I did not activate the new features yet.

I checked the backups and there are still slowness issues compared to before and many weird results:

OS Hardware Virtio Guest Disk Backup on PVE v7 Backup on PVE v8qemu-kvm
version Agent size
117 pve2-1 Deb10 i440fx 3.1.0 16 Gb cache=writethrough 00:10:54 6.28 GB 00:34:44 6.21 GB
207 pve2-1 Win2k19 i440fx-5.2 21500 102.10.0 512 Gb cache=writethrough 00:27:28 35.50 GB 01:19:53 36.11 GB
253 pve2-1 W10/22H2 i440fx-7.2 22900 105.00.2 64 Gb cache=writethrough 00:09:14 16.49 GB 04:43:19 16.26 GB
101 pve2-2 Deb12 i440fx 5.2.0 32 Gb cache=writethrough 00:11:44 4.23 GB 00:06:15 4.11 GB
107 pve2-2 Deb9 i440fx 3.1.0 32 Gb cache=writethrough 00:02:45 0.82 GB 01:45:32 0.87 GB
146 pve2-2 Win2k19 i440fx-5.2 22900 105.00.2 128 Gb cache=writeback 00:26:17 21.26 GB 00:55:37 21.55 GB
210 pve2-2 Deb10 i440fx 16 Gb cache=writethrough 00:02:16 0.72 GB 00:02:05 0.72 GB
231 pve2-2 Deb11 i440fx 16 Gb cache=writethrough 00:03:31 1.05 GB 01:09:50 1.13 GB
220 pve2-3 Win2k19 i440fx-5.1 21500 102.10.0 256 Gb cache=default(no) 00:17:07 19.86 GB 00:09:37 20.22 GB
105 pve1-1 Deb10 i440fx 3.1.0 32 Gb cache=writethrough 00:06:47 2.30 GB 02:23:23 2.28 GB
132 pve1-4 Win2k19 i440fx-5.1 21500 102.10.0 64 Gb cache=default(no) 00:14:31 12.03 GB 02:41:26 12.59 GB
133 pve1-4 Win2k19 i440fx-5.1 21500 102.10.0 64 Gb cache=default(no) 00:12:58 12.14 GB 00:08:09 12.67 GB
 
i have observed a similar performance issue on zfs shared via samba, unfortunately i'm not yet able to reproduce.

when live migrating a qcow2 virtual disk hosted on zfs/samba share to disk to local ssd, i observed pathological slowness and saw lot's of write IOPs on the source (!) whereas i would not expect any write iops for this. it did not go away by disabling atime, as before.
i had seen similar slownewss during backup when vdisk was on zfs/samba share. i'm using that for clrearing up local ssd space when a VM is obsolete or not used for longer.

the read performance during live migrate was very well below 1MB/s

i stopped the migration and tried to migrate the virtual disk offline, which performed surprisingly fast and worked without a problem.

then i did migrate the vdisk back to the samba share , which also performed well and then tried again, but for my curiousity, the problem was gone and isn't reproducible.

very very weird....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!