I have been managing a PVE cluster at my company for about 5 years, and I have adapted settings and hardware configs as I have learned best practices with PVE, but one major issue complicates our ability to maintain operations: Disk IO Delay on large VM disk deletion within ZFS.
We host production Windows and Linux VMs to build and validate software within R&D with software dating back to the 1980s, creating a massive amount of build requirements and artifacts possible for a generic build VM. Our VMs are configured to 3TB virtio disks, and we may have a few million files/folders accumulated over time, because when you have over 1000 VMs, the fastest network fetch is the one that does not have to happen so a 2 minute compile doesn't spend hours copying things from the network.
Whenever we need to recreate VMs, especially if we're pushing out a new image annually, I see ZFS shortly after deleting the old VM consume an enormous amount of disk IO. Even in our latest servers, I'll see 20% IO delay, which in my experience is the tip of the iceberg with actual IO delay.
When we recreate VMs, we run commands like 'pvesm alloc' and 'qm set' and when disks are busy and don't respond immediately, it will error out about the disks not responding.
Our latest servers are ones like Dell R7625 containing:
Ideally, I would like a way to tell ZFS to cap these sorts of operations to 30% and complete it over time or somehow as a lower priority. Is there any way to tune ZFS for this?
We host production Windows and Linux VMs to build and validate software within R&D with software dating back to the 1980s, creating a massive amount of build requirements and artifacts possible for a generic build VM. Our VMs are configured to 3TB virtio disks, and we may have a few million files/folders accumulated over time, because when you have over 1000 VMs, the fastest network fetch is the one that does not have to happen so a 2 minute compile doesn't spend hours copying things from the network.
Whenever we need to recreate VMs, especially if we're pushing out a new image annually, I see ZFS shortly after deleting the old VM consume an enormous amount of disk IO. Even in our latest servers, I'll see 20% IO delay, which in my experience is the tip of the iceberg with actual IO delay.
When we recreate VMs, we run commands like 'pvesm alloc' and 'qm set' and when disks are busy and don't respond immediately, it will error out about the disks not responding.
Our latest servers are ones like Dell R7625 containing:
- 2x AMD Zen 4 9374F x2 (64C) @ ~4.1 GHz
- 1280 GB RAM
- ZFS RAID10 with 16x8TB drives (Samsung PM1743 or similar Kioxia Gen 5 NVMe SSDs, x4 trained)
Ideally, I would like a way to tell ZFS to cap these sorts of operations to 30% and complete it over time or somehow as a lower priority. Is there any way to tune ZFS for this?