Windows guest: Overwriting fio test file is fast on NTFS+LVM, slow on NTFS+ZVOL

Taudris

New Member
Jul 10, 2023
7
0
1
I have a Windows guest with NTFS on LVM on an SSD for the OS drive, which works fine, and I have 3x14TB as a RAIDZ1 ZVOL attached for bulk storage, which does not work fine (i.e. months-long I/O delays and guest file system corruption). I've identified a strange behavior that could be a sign of a more basic problem.

I leave an SSH session with this command running on the host:

watch -n 0 zpool iostat -v -y 5 1

When I run this fio test on the host in another SSH session:

fio --rw=write --bs=1m --direct=0 --end_fsync=1 --ioengine=libaio --size=4g --filename /storage/pve/fiotest --name=job

Everything is fine and I get consistent results between runs. But when I run this theoretically-equivalent fio test on the guest:

fio --rw=write --bs=1m --direct=0 --end_fsync=1 --ioengine=windowsaio --size=4g --filename fiotest --name=job

It's fast the first time and zpool iostat shows almost entirely writes:

Code:
                                                capacity     operations     bandwidth
pool                                          alloc   free   read  write   read  write
--------------------------------------------  -----  -----  -----  -----  -----  -----
storage                                       22.8T  15.4T      3    925  60.8K   333M
  raidz1-0                                    22.8T  15.4T      3    925  60.8K   333M
    ata-WDC_WD140EDGZ-11B1PA0_9MGWXR5J            -      -      0    320  17.6K   111M
    ata-WDC_WD140EDGZ-11B1PA0_Y5KTNW2C-part2      -      -      1    293  30.4K   111M
    ata-WDC_WD140EDGZ-11B1PA0_Y5KUYX8D-part2      -      -      0    311  12.8K   111M

But on subsequent runs, it's about half as fast and there's lots of read operations:

Code:
                                                capacity     operations     bandwidth
pool                                          alloc   free   read  write   read  write
--------------------------------------------  -----  -----  -----  -----  -----  -----
storage                                       22.8T  15.4T    170    487  5.23M   146M
  raidz1-0                                    22.8T  15.4T    170    487  5.23M   146M
    ata-WDC_WD140EDGZ-11B1PA0_9MGWXR5J            -      -     60    169  1.87M  48.8M
    ata-WDC_WD140EDGZ-11B1PA0_Y5KTNW2C-part2      -      -     49    156  1.52M  48.8M
    ata-WDC_WD140EDGZ-11B1PA0_Y5KUYX8D-part2      -      -     60    161  1.84M  48.8M

What's weird is if I delete the fiotest file between runs, it's fast again.

When I run this test on the OS drive, there is no need to delete the file between runs; it's always fast.

volblocksize is 64k, and the file system is formatted with 64k clusters. (The one with 8k is going to get migrated.) I just updated to Proxmox 8 a few hours ago, and this behavior remains the same. I also tried downgrading the virtio-scsi drivers in the guest from 0.1.229 to 0.1.204, but again, no change.

Code:
root@pve:~# pveversion
pve-manager/8.0.3/bbf3993334bfa916 (running kernel: 6.2.16-4-pve)

Code:
root@pve:~# qm config 103
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide2
cores: 8
cpu: host
efidisk0: local-lvm:vm-103-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: local:iso/virtio-win-0.1.204.iso,media=cdrom,size=543272K
machine: pc-q35-6.1
memory: 8192
name: ws22-fs
net0: virtio=B6:25:CA:43:B8:DF,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: win11
scsi0: local-lvm:vm-103-disk-1,discard=on,iothread=1,size=32G,ssd=1
scsi1: pve:vm-103-disk-0,backup=0,discard=on,iothread=1,replicate=0,size=6T,ssd=1
scsi2: pve:vm-103-disk-1,backup=0,discard=on,iothread=1,replicate=0,size=21012G,ssd=1
scsi3: pve:vm-103-disk-2,backup=0,discard=on,iothread=1,replicate=0,size=6T,ssd=1
scsi4: pve:vm-103-disk-3,backup=0,discard=on,iothread=1,replicate=0,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=34362e52-b2ec-4af7-ae8c-3d19c5555588
sockets: 1
tablet: 1
vga: qxl,memory=32
vmgenid: 45d62539-0ca5-47dd-8822-25a093a63e11

Code:
root@pve:~# zfs get volblocksize
NAME                       PROPERTY      VALUE     SOURCE
storage                    volblocksize  -         -
storage/pve                volblocksize  -         -
storage/pve/vm-103-disk-0  volblocksize  64K       -
storage/pve/vm-103-disk-1  volblocksize  8K        default
storage/pve/vm-103-disk-2  volblocksize  64K       -
storage/pve/vm-103-disk-3  volblocksize  64K       -
 
maybe u have to tune the vollblocksize and esspecially the recordsize of zfs. search the internet!
 
My question isn't about tuning. If the ZFS filesystem on the host with recordsize=64k and the ZFS volume mounted to the guest with volblocksize=64k performed the same, I would be happy.

As far as I understand, they should perform the same since they both work in 64k chunks, but my tests show that they are quite different.

There's a couple more things I can try. I can disable thin provisioning and see if that improves rewrite performance, and I can redo the test using a ZFS volume mounted on the host.