[SOLVED] Windows VM I/O problems only with ZFS

RAIDz works but has terrible write amplification and padding overhead + low performance. There are tons of threads regarding this, i.e.:

https://forum.proxmox.com/threads/about-zfs-raidz1-and-disk-space.110478/post-475702
https://forum.proxmox.com/threads/raidz-out-of-space.112822/post-487183

It's just the way RAIDz + zvol works. It can be mitigated with bigger volblocksize and using more disks in the RAIDz, but IMHO isn't worth it: just buy more/bigger drives and stick to RAID10, specially when the total space needed will be low and you are using consumer drives, as using more/bigger drives is cheap.

Even if using LZ4 compression?
I've read many posts that argue that with compression on ZFS, everything changes on this subject.
(even the common suggested volblocksize tuning to the ashift and the number of disks of the pool minus parity, seems to looses meaning with compression on)

On these topics I have read a lot of information, tutorials, tuning guides, well detailed posts on this forum... many times in conflict with each other.

It's really difficult to understand which is actually the best choice. Considering also that it depends a lot on the load, type of I/O and configuration of the VMs... which will certainly be very variable within a farm, because of different VMs for different purposes...
 
Not a real expert here, but AFAIK compression doesn't matter on padding, as padding is applied after compressing data. Using compression will potentially make the data to write smaller, so zfs apply padding to the amount of data you really write. Even if that write will need to use a few extra sectors for padding that the original, uncompressed, write may have not need, you still use less disk space when using compression because you need less sectors in total.

I've only had a couple of VMs in production for a few months using RAIDz-1 + compression, with 6 drives, 16k volblocksize and theory proved right, as the overhead was around 30%, comparing du -sh in the VMs with the amount of space used in the zpool.

You've probably found this already, but I found this explanation quite helpful:

https://www.reddit.com/r/zfs/comments/ujba3i/does_allocation_overhead_still_exist_with/
 
  • Like
Reactions: EdoFede
It's just a lab, but for some machines we are going this way because of customers budget constraints.
(I think it's better than spinning SATA drives, anyway)

We are testing the "worst config" now

Thanks
ZFS cannot be tested without proper disks, even for a lab. Experience will wrong.
ZFS require datacenter ssd drive (with plp/capacitors and many TBW) or many hdd and lot of RAM.
"budget constraints" can't use ZFS and should go to the default ext4/LVMThin + a daily fast backup which PBS provide.
 
ZFS cannot be tested without proper disks, even for a lab. Experience will wrong.
ZFS require datacenter ssd drive (with plp/capacitors and many TBW) or many hdd and lot of RAM.
"budget constraints" can't use ZFS and should go to the default ext4/LVMThin + a daily fast backup which PBS provide.

I don't totally agree.

ZFS was created to work with spinning disks that are much slower than the SSDs used in my tests. And I've used in the past many times even with 5.4k rpm drives without issues.
After reducing the "zfs_dirty_data_max" the issue has gone and the system simply "run as fast as disks permit" without troubles.

I can agree that on an enterprise-grade VM production system it is not a suitable solution, but to say that even in a test lab you cannot experiment with normal consumer-grade SSDs, I find it exaggerated.

For "experiment" I mean testing the PVE/PBS products, not doing some kind of comparison or performance benchmarks. (This was never the point of my thread).
 
Agree, but I'm curious too as I don't know if that would change anything in the original problem.

I've done some test reverting the server to the original config (no ZFS tuning parameters) and playing with the zfs sync parameter.

Here the results.

sync=standard (default setting)
ZFS Sync test - Sync standard.png
sync=always (all writes treated as sync writes)
ZFS Sync test - Sync always.png

sync=disabled (all writes treated as async writes)
ZFS Sync test - Sync disabled.png

With the last option, the process get notified of the write done, even if it's not (so very unsecure).

Bye,
Edoardo
 
  • Like
Reactions: VictorSTS
You're welcome!

Yes, interesting. Even if they aren't exactly very good in terms of performance.
Probably some testing with "fio" could give a little more accurate results.
However, they should always be taken as a comparison test in my setup, not as absolute results.
 
I just wanted to thank the OP and others with the zfs_dirty_data_max suggestion.

I have been working on stabilizing Windows VMs on AMD "Zen 4" EPYC 9004 on:
- Dell R7625
- 2x Zen AMD EPYC 9554p (128 physical cores 3.1-3.75 GHz)
- 16x PCIe Gen 3 enterprise SSDs in ZFS soft RAID
- 20x128 GB DDR5
- Hosting 35-60 VMs with lots of data, mostly Windows and Linux builds can merge TBs of qcow2 disks with Packer

We've had freezes recurring for a number of issues, and the last one seems to be correlated with disk IO.

While I'm not sure if zfs_dirty_data_max fixes it, it had an enormous improvement in our disk throughput and responsiveness. We had AzDo agents disconnect and brick builds and reconnect under high load, and I just ran 2x the packer builds that one would have brought down agents and not a hitch and it build 2x faster despite being concurrent.

I don't know why, I guess ZFS must hit some sort of bottleneck when it decides to flush the dirty writes if the size is too large in some configurations.

Ours was defaulting to about 4 GB, I changed it to about 1 GB after the initial 50 MB value discussed worked but dramatically reduced disk IO throughput.

Permanent:
/etc/modprobe.d/zfs.conf
options zfs zfs_dirty_data_max=1073741824
update-initramfs -u -k all
reboot now

Current OS session only:
echo "1073741824" >/sys/module/zfs/parameters/zfs_dirty_data_max

I also had to disable ZFS compression to make this happen, qemu-img merging TBs of data with light ZFS compression consumes all of the host's CPU, yes all 128 physical high frequency latest gen cores. So don't let people hand wave that ZFS light compression has no major impact on performance, it is not true for all use cases.
 
  • Like
Reactions: _gabriel
Hi,
I'm a new Proxmox user.

I've built a small lab to evaluate PVE and PBS, with the intention of replace out Hyper-V infrastructure (90 VMs in 3 sites, many in replica).

We got two Dell R640 servers for PVE and another for PBS.
For this testing purpose we are using 4x WD Blue 1TB SSDs (model WDS100T2B0A) that are laying around.
Two per server (+ separate OS disks) in ZFS mirror config, with ashift=12.

HW config (one server)
Dell R640
2x Intel Xeon Gold 5120
64GB RAM (planned to be expanded to 256GB)
Controller Dell Perc in Passthrough mode


The idea is to run two PVE nodes in HA cluster with ZFS replication between nodes, a remote replica for disaster recovery purpose (for critical VMs) and a local third server with PBS for backups (used also as qDevice for HA quorum).

I configured a network ring (with one 2x10G ethernet card per server), with RSTP over Open vSwitch between 2 PVE and 1 PBS.
All works as expected with very good satisfaction.

But...
While testing with some VMs (for the most part they will be windows servers) I ran into a major stability issue during high I/O.

The Write I/O performance is very poor and the VM become very(very) slow to user interaction and other action during a CrystalDiskMark default test.
The benchmark result also go to 0.00 in 1 or more write test (not always repeatable) at the end.

The benchmark with CrystalDiskMark was carried out after noticing an anomalous behavior during the duplication of a simple file (1GB) inside the VM (guest operating system froze a few seconds after starting the copy and remained unresponsive until the end).

This behaviour happens only if I use ZFS as the storage engine, with any combination of storage parameters, except "Writeback (unsafe)" cache.
And only with Windows Write cache active and buffer flushing turned on (flag unchecked), which is the "standard" windows configuration.

Tests I've made to figure out the problem:
- Every combination of cache setting inside the VM Windows write caching (problems with write cache active, as described)
- Every combination of VM cache setting for the virtual disks (impact on results, but same behaviour, EXCEPT for "Writeback (unsafe)" )
- Separating test disk from OS disk inside VM (no difference)
- Create a separate disk for paging file inside VM (no difference)
- Playing with ZFS ashift, volblocksize, VM NTFS allocation size (very light impact on results and same behaviour)
- Enabling/disabling ZFS cache on the Zvol during test (huge impact on read results, but same behaviour)
- Enabling/disabling ZFS compression (impact on results, but same behaviour)
- Changing Zpool from mirror to single disk (near same behaviour)
- Changing storage engine from ZFS to ext4 (PROBLEM SOLVED using ext4 instead of ZFS)

(of course I've reinstalled the VM for every ZFS layer modification like compression and change of ashift/volblocksize)

It seems like a particular problem related to ZFS in my setup.
I've searched around for days, found post like this one (https://forum.proxmox.com/threads/p...ndows-server-2022-et-write-back-disks.127580/) with near the same issue, but found no practical info except for enterprise SSDs suggestions.


I know that I'm using consumer-grade drives for this test, but since the issue is huge and only present with a certain combination of configuration, I'm searching for an help to figure out the source of the real problem.

Some results from the last test I've ran
Result for ZFS on single disk, with Win cache ON and buffer flush ON
View attachment 58006

Result for ZFS on single disk, with Win cache ON and buffer flush OFF (unsafe)
View attachment 58007

Result for ZFS on single disk, with Win cache OFF
View attachment 58008

Similar behaviour with the ZFS mirror on two disks.

Result using ext4 instead of ZFS on same hardware (and single disk)
No problem at all in this case
View attachment 58009


Hope someone help me to understand where is the problem and how to solve it.

Thanks in advance!
Edoardo





pveversion
Code:
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1


zpool status
Code:
pool: ZFS-Lab2
 state: ONLINE
  scan: scrub repaired 0B in 00:03:57 with 0 errors on Sun Nov 12 00:27:59 2023
config:


    NAME                                         STATE     READ WRITE CKSUM
    ZFS-Lab2                                     ONLINE       0     0     0
      mirror-0                                   ONLINE       0     0     0
        ata-WDC_WDS100T2B0A-00SM50_183602A01791  ONLINE       0     0     0
        ata-WDC_WDS100T2B0A_1849AC802510         ONLINE       0     0     0


errors: No known data errors


  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:03:17 with 0 errors on Sun Nov 12 00:27:20 2023
config:


    NAME                                        STATE     READ WRITE CKSUM
    rpool                                       ONLINE       0     0     0
      mirror-0                                  ONLINE       0     0     0
        ata-TOSHIBA_MQ01ABF050_863LT034T-part3  ONLINE       0     0     0
        ata-TOSHIBA_MQ01ABF050_27MDSVHVS-part3  ONLINE       0     0     0


errors: No known data errors


VM config
Code:
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: x86-64-v4
efidisk0: Test:104/vm-104-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
machine: pc-i440fx-8.0
memory: 8192
meta: creation-qemu=8.0.2,ctime=1699873712
name: Testzzz
net0: virtio=8E:46:68:39:1E:CA,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: Test2:vm-104-disk-0,discard=on,iothread=1,size=100G
scsihw: virtio-scsi-single
smbios1: uuid=b65697c3-3c86-4c72-86a4-92b05fc8f241
sockets: 1
tpmstate0: Test:104/vm-104-disk-2.raw,size=4M,version=v2.0
unused0: Test:104/vm-104-disk-1.raw
vmgenid: 0186fd26-8c5d-4ef0-a8bc-d0ff738bef43
Hi,

EdoFede

Use the "virtio" driver for disks, you have scsi. Do everything according to the instructions https://pve.proxmox.com/wiki/Windows_10_guest_best_practices
 
5/7 days of uptime without freezes with 1 GB dirty cache.


It has also resolved other problems:
  1. I no longer see our disk IO alternate between max speed and 0 over and over again. I believe this is all related and is what the OP saw with benchmarks.
  2. I no longer get bogus errors when using Ansible to recreate-vms, source we've had for years. During busy load before, I'd get errors attaching an EFI or TPM disk. I believe that if these commands run on the host during a seizure of disk IO writes, it returns these errors.
  3. Throughput increased by over 2x. A giant list of packer templates we use to define our Windows build image around 600 GB but in 20 pieces would take qemu-img rebase/commit 4.6 hours. Now it takes 2.2 hours.
I have a ticket open with PVE support that I'll recommend that they explore the default calculation of zfs_dirty_data_max and why 4 GB seems to cause this behavior. Maybe on some systems, too large of a buffer congests PCIe lanes between multiple CPU sockets and causes the NUMA load balancer on the host to halt IO which can cause all of the issues above and Windows NT to kernel panic and freeze without a trace?

2 more days and I'll be confident.
 
Looking like this immunizes us against 99% of freezes without having to cap disk IO per VM, which is huge. We did have 1 freeze, but it was an absurd amount of IO. Essentially qemu-img was saturating bandwidth while 35 VMs were building in production and I was hammering the host with bootstrapping 10 large Windows VMs which performs a lot of small writes, saturating IOPs.

Before, we'd have 3 freezes per day per host with light load. I just torture testing our server with the zfs dirty write edit to 1 GB and had 1 freeze in a week. In normal conditions with production, we probably won't see it more than a few times per year. I was able to get over 10 days of uptime without freezes by severely restricting IO to 150 MB/s read and write per VM. Now it's uncapped.

This is the mail thread of the NUMA load balancer bug that seems to be the real issue. High disk IO seems to be a catalyst in triggering it.
https://lists.proxmox.com/pipermail/pve-devel/2024-January/061399.html

While I have made this edit on one host, I disabled the NUMA load balancer on another without caps or the zfs_dirty_data_max and I also torture tested it, and 0 freezes. I do not recommend disabling the NUMA load balancer as a fix because it degrades the server performance severely to the point a lot of SSH and Ansible proxmox_kvm error out about being unreachable or timing out. Also the web dashboard will error a lot with "too many redirections", or simply take a long time to update. My understanding is that while disabling the NUMA load balancer bypasses the issue, it also causes PVE to treat multi-socket chips as 1 and it will cross boundaries and use RAM channels belonging to 1 physical socket for threads running on the other socket and it congests PCIe traffic.
 
Last edited:
  • Like
Reactions: EdoFede
Hi,
I'm a new Proxmox user.

I've built a small lab to evaluate PVE and PBS, with the intention of replace out Hyper-V infrastructure (90 VMs in 3 sites, many in replica).

We got two Dell R640 servers for PVE and another for PBS.
For this testing purpose we are using 4x WD Blue 1TB SSDs (model WDS100T2B0A) that are laying around.
Two per server (+ separate OS disks) in ZFS mirror config, with ashift=12.

HW config (one server)
Dell R640
2x Intel Xeon Gold 5120
64GB RAM (planned to be expanded to 256GB)
Controller Dell Perc in Passthrough mode


The idea is to run two PVE nodes in HA cluster with ZFS replication between nodes, a remote replica for disaster recovery purpose (for critical VMs) and a local third server with PBS for backups (used also as qDevice for HA quorum).

I configured a network ring (with one 2x10G ethernet card per server), with RSTP over Open vSwitch between 2 PVE and 1 PBS.
All works as expected with very good satisfaction.

But...
While testing with some VMs (for the most part they will be windows servers) I ran into a major stability issue during high I/O.

The Write I/O performance is very poor and the VM become very(very) slow to user interaction and other action during a CrystalDiskMark default test.
The benchmark result also go to 0.00 in 1 or more write test (not always repeatable) at the end.

The benchmark with CrystalDiskMark was carried out after noticing an anomalous behavior during the duplication of a simple file (1GB) inside the VM (guest operating system froze a few seconds after starting the copy and remained unresponsive until the end).

This behaviour happens only if I use ZFS as the storage engine, with any combination of storage parameters, except "Writeback (unsafe)" cache.
And only with Windows Write cache active and buffer flushing turned on (flag unchecked), which is the "standard" windows configuration.

Tests I've made to figure out the problem:
- Every combination of cache setting inside the VM Windows write caching (problems with write cache active, as described)
- Every combination of VM cache setting for the virtual disks (impact on results, but same behaviour, EXCEPT for "Writeback (unsafe)" )
- Separating test disk from OS disk inside VM (no difference)
- Create a separate disk for paging file inside VM (no difference)
- Playing with ZFS ashift, volblocksize, VM NTFS allocation size (very light impact on results and same behaviour)
- Enabling/disabling ZFS cache on the Zvol during test (huge impact on read results, but same behaviour)
- Enabling/disabling ZFS compression (impact on results, but same behaviour)
- Changing Zpool from mirror to single disk (near same behaviour)
- Changing storage engine from ZFS to ext4 (PROBLEM SOLVED using ext4 instead of ZFS)

(of course I've reinstalled the VM for every ZFS layer modification like compression and change of ashift/volblocksize)

It seems like a particular problem related to ZFS in my setup.
I've searched around for days, found post like this one (https://forum.proxmox.com/threads/p...ndows-server-2022-et-write-back-disks.127580/) with near the same issue, but found no practical info except for enterprise SSDs suggestions.


I know that I'm using consumer-grade drives for this test, but since the issue is huge and only present with a certain combination of configuration, I'm searching for an help to figure out the source of the real problem.

Some results from the last test I've ran
Result for ZFS on single disk, with Win cache ON and buffer flush ON
View attachment 58006

Result for ZFS on single disk, with Win cache ON and buffer flush OFF (unsafe)
View attachment 58007

Result for ZFS on single disk, with Win cache OFF
View attachment 58008

Similar behaviour with the ZFS mirror on two disks.

Result using ext4 instead of ZFS on same hardware (and single disk)
No problem at all in this case
View attachment 58009


Hope someone help me to understand where is the problem and how to solve it.

Thanks in advance!
Edoardo





pveversion
Code:
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1


zpool status
Code:
pool: ZFS-Lab2
 state: ONLINE
  scan: scrub repaired 0B in 00:03:57 with 0 errors on Sun Nov 12 00:27:59 2023
config:


    NAME                                         STATE     READ WRITE CKSUM
    ZFS-Lab2                                     ONLINE       0     0     0
      mirror-0                                   ONLINE       0     0     0
        ata-WDC_WDS100T2B0A-00SM50_183602A01791  ONLINE       0     0     0
        ata-WDC_WDS100T2B0A_1849AC802510         ONLINE       0     0     0


errors: No known data errors


  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:03:17 with 0 errors on Sun Nov 12 00:27:20 2023
config:


    NAME                                        STATE     READ WRITE CKSUM
    rpool                                       ONLINE       0     0     0
      mirror-0                                  ONLINE       0     0     0
        ata-TOSHIBA_MQ01ABF050_863LT034T-part3  ONLINE       0     0     0
        ata-TOSHIBA_MQ01ABF050_27MDSVHVS-part3  ONLINE       0     0     0


errors: No known data errors


VM config
Code:
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 8
cpu: x86-64-v4
efidisk0: Test:104/vm-104-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
machine: pc-i440fx-8.0
memory: 8192
meta: creation-qemu=8.0.2,ctime=1699873712
name: Testzzz
net0: virtio=8E:46:68:39:1E:CA,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: Test2:vm-104-disk-0,discard=on,iothread=1,size=100G
scsihw: virtio-scsi-single
smbios1: uuid=b65697c3-3c86-4c72-86a4-92b05fc8f241
sockets: 1
tpmstate0: Test:104/vm-104-disk-2.raw,size=4M,version=v2.0
unused0: Test:104/vm-104-disk-1.raw
vmgenid: 0186fd26-8c5d-4ef0-a8bc-d0ff738bef43

i am late in this party !

For ZFS backed VM disk - select as following.
  • "VirtIO SCSI Single" as controller
  • SCSI as a Bus/Device
  • SSD Emulation - yes ( even if ZFS is on HDD, just to disable disk defragment scheduler, and still you should manually disable disk defragment scheduler in guest OS too. )
  • Cache - No Cache
  • Discard - yes ( even if ZFS is on HDD, this way ZFS knows which portion of physical HDD is not in use, but i am still confused about this and may update this part with certain answer later after tests. )
  • IO Thread - yes
  • Backup - Yes/No ( as per your need )

---- now very important -----
  • Just try selecting "native" in Async_IO , during guest disk creation.
 
Last edited:
i am late in this party !

For ZFS backed VM disk - select as following.
  • "VirtIO SCSI Single" as controller
  • SCSI as a Bus/Device
  • SSD Emulation - yes ( even if ZFS is on HDD, just to disable disk defragment scheduler, and still you should manually disable disk defragment scheduler in guest OS too. )
  • Cache - No Cache
  • Discard - yes ( even if ZFS is on HDD, this way ZFS knows which portion of physical HDD is not in use, but i am still confused about this and may update this part with certain answer later after tests. )
  • IO Thread - yes
  • Backup - Yes/No ( as per your need )

---- now very important -----
  • Just try selecting "native" in Async_IO , during guest disk creation.

if you read here: https://forum.proxmox.com/threads/async-io-io_uring-native-or-threads.139849/
as far as i understand:
native should not be used on ZFS as metadata writes will block
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!