Windows guest: Overwriting fio test file is fast on NTFS+LVM, slow on NTFS+ZVOL

Taudris

New Member
Jul 10, 2023
7
0
1
I have a Windows guest with NTFS on LVM on an SSD for the OS drive, which works fine, and I have 3x14TB as a RAIDZ1 ZVOL attached for bulk storage, which does not work fine (i.e. months-long I/O delays and guest file system corruption). I've identified a strange behavior that could be a sign of a more basic problem.

I leave an SSH session with this command running on the host:

watch -n 0 zpool iostat -v -y 5 1

When I run this fio test on the host in another SSH session:

fio --rw=write --bs=1m --direct=0 --end_fsync=1 --ioengine=libaio --size=4g --filename /storage/pve/fiotest --name=job

Everything is fine and I get consistent results between runs. But when I run this theoretically-equivalent fio test on the guest:

fio --rw=write --bs=1m --direct=0 --end_fsync=1 --ioengine=windowsaio --size=4g --filename fiotest --name=job

It's fast the first time and zpool iostat shows almost entirely writes:

Code:
                                                capacity     operations     bandwidth
pool                                          alloc   free   read  write   read  write
--------------------------------------------  -----  -----  -----  -----  -----  -----
storage                                       22.8T  15.4T      3    925  60.8K   333M
  raidz1-0                                    22.8T  15.4T      3    925  60.8K   333M
    ata-WDC_WD140EDGZ-11B1PA0_9MGWXR5J            -      -      0    320  17.6K   111M
    ata-WDC_WD140EDGZ-11B1PA0_Y5KTNW2C-part2      -      -      1    293  30.4K   111M
    ata-WDC_WD140EDGZ-11B1PA0_Y5KUYX8D-part2      -      -      0    311  12.8K   111M

But on subsequent runs, it's about half as fast and there's lots of read operations:

Code:
                                                capacity     operations     bandwidth
pool                                          alloc   free   read  write   read  write
--------------------------------------------  -----  -----  -----  -----  -----  -----
storage                                       22.8T  15.4T    170    487  5.23M   146M
  raidz1-0                                    22.8T  15.4T    170    487  5.23M   146M
    ata-WDC_WD140EDGZ-11B1PA0_9MGWXR5J            -      -     60    169  1.87M  48.8M
    ata-WDC_WD140EDGZ-11B1PA0_Y5KTNW2C-part2      -      -     49    156  1.52M  48.8M
    ata-WDC_WD140EDGZ-11B1PA0_Y5KUYX8D-part2      -      -     60    161  1.84M  48.8M

What's weird is if I delete the fiotest file between runs, it's fast again.

When I run this test on the OS drive, there is no need to delete the file between runs; it's always fast.

volblocksize is 64k, and the file system is formatted with 64k clusters. (The one with 8k is going to get migrated.) I just updated to Proxmox 8 a few hours ago, and this behavior remains the same. I also tried downgrading the virtio-scsi drivers in the guest from 0.1.229 to 0.1.204, but again, no change.

Code:
root@pve:~# pveversion
pve-manager/8.0.3/bbf3993334bfa916 (running kernel: 6.2.16-4-pve)

Code:
root@pve:~# qm config 103
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide2
cores: 8
cpu: host
efidisk0: local-lvm:vm-103-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: local:iso/virtio-win-0.1.204.iso,media=cdrom,size=543272K
machine: pc-q35-6.1
memory: 8192
name: ws22-fs
net0: virtio=B6:25:CA:43:B8:DF,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: win11
scsi0: local-lvm:vm-103-disk-1,discard=on,iothread=1,size=32G,ssd=1
scsi1: pve:vm-103-disk-0,backup=0,discard=on,iothread=1,replicate=0,size=6T,ssd=1
scsi2: pve:vm-103-disk-1,backup=0,discard=on,iothread=1,replicate=0,size=21012G,ssd=1
scsi3: pve:vm-103-disk-2,backup=0,discard=on,iothread=1,replicate=0,size=6T,ssd=1
scsi4: pve:vm-103-disk-3,backup=0,discard=on,iothread=1,replicate=0,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=34362e52-b2ec-4af7-ae8c-3d19c5555588
sockets: 1
tablet: 1
vga: qxl,memory=32
vmgenid: 45d62539-0ca5-47dd-8822-25a093a63e11

Code:
root@pve:~# zfs get volblocksize
NAME                       PROPERTY      VALUE     SOURCE
storage                    volblocksize  -         -
storage/pve                volblocksize  -         -
storage/pve/vm-103-disk-0  volblocksize  64K       -
storage/pve/vm-103-disk-1  volblocksize  8K        default
storage/pve/vm-103-disk-2  volblocksize  64K       -
storage/pve/vm-103-disk-3  volblocksize  64K       -
 
maybe u have to tune the vollblocksize and esspecially the recordsize of zfs. search the internet!
 
My question isn't about tuning. If the ZFS filesystem on the host with recordsize=64k and the ZFS volume mounted to the guest with volblocksize=64k performed the same, I would be happy.

As far as I understand, they should perform the same since they both work in 64k chunks, but my tests show that they are quite different.

There's a couple more things I can try. I can disable thin provisioning and see if that improves rewrite performance, and I can redo the test using a ZFS volume mounted on the host.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!