I have a Windows guest with NTFS on LVM on an SSD for the OS drive, which works fine, and I have 3x14TB as a RAIDZ1 ZVOL attached for bulk storage, which does not work fine (i.e. months-long I/O delays and guest file system corruption). I've identified a strange behavior that could be a sign of a more basic problem.
I leave an SSH session with this command running on the host:
When I run this fio test on the host in another SSH session:
Everything is fine and I get consistent results between runs. But when I run this theoretically-equivalent fio test on the guest:
It's fast the first time and
But on subsequent runs, it's about half as fast and there's lots of read operations:
What's weird is if I delete the
When I run this test on the OS drive, there is no need to delete the file between runs; it's always fast.
volblocksize is 64k, and the file system is formatted with 64k clusters. (The one with 8k is going to get migrated.) I just updated to Proxmox 8 a few hours ago, and this behavior remains the same. I also tried downgrading the virtio-scsi drivers in the guest from 0.1.229 to 0.1.204, but again, no change.
I leave an SSH session with this command running on the host:
watch -n 0 zpool iostat -v -y 5 1
When I run this fio test on the host in another SSH session:
fio --rw=write --bs=1m --direct=0 --end_fsync=1 --ioengine=libaio --size=4g --filename /storage/pve/fiotest --name=job
Everything is fine and I get consistent results between runs. But when I run this theoretically-equivalent fio test on the guest:
fio --rw=write --bs=1m --direct=0 --end_fsync=1 --ioengine=windowsaio --size=4g --filename fiotest --name=job
It's fast the first time and
zpool iostat
shows almost entirely writes:
Code:
capacity operations bandwidth
pool alloc free read write read write
-------------------------------------------- ----- ----- ----- ----- ----- -----
storage 22.8T 15.4T 3 925 60.8K 333M
raidz1-0 22.8T 15.4T 3 925 60.8K 333M
ata-WDC_WD140EDGZ-11B1PA0_9MGWXR5J - - 0 320 17.6K 111M
ata-WDC_WD140EDGZ-11B1PA0_Y5KTNW2C-part2 - - 1 293 30.4K 111M
ata-WDC_WD140EDGZ-11B1PA0_Y5KUYX8D-part2 - - 0 311 12.8K 111M
But on subsequent runs, it's about half as fast and there's lots of read operations:
Code:
capacity operations bandwidth
pool alloc free read write read write
-------------------------------------------- ----- ----- ----- ----- ----- -----
storage 22.8T 15.4T 170 487 5.23M 146M
raidz1-0 22.8T 15.4T 170 487 5.23M 146M
ata-WDC_WD140EDGZ-11B1PA0_9MGWXR5J - - 60 169 1.87M 48.8M
ata-WDC_WD140EDGZ-11B1PA0_Y5KTNW2C-part2 - - 49 156 1.52M 48.8M
ata-WDC_WD140EDGZ-11B1PA0_Y5KUYX8D-part2 - - 60 161 1.84M 48.8M
What's weird is if I delete the
fiotest
file between runs, it's fast again.When I run this test on the OS drive, there is no need to delete the file between runs; it's always fast.
volblocksize is 64k, and the file system is formatted with 64k clusters. (The one with 8k is going to get migrated.) I just updated to Proxmox 8 a few hours ago, and this behavior remains the same. I also tried downgrading the virtio-scsi drivers in the guest from 0.1.229 to 0.1.204, but again, no change.
Code:
root@pve:~# pveversion
pve-manager/8.0.3/bbf3993334bfa916 (running kernel: 6.2.16-4-pve)
Code:
root@pve:~# qm config 103
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide2
cores: 8
cpu: host
efidisk0: local-lvm:vm-103-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: local:iso/virtio-win-0.1.204.iso,media=cdrom,size=543272K
machine: pc-q35-6.1
memory: 8192
name: ws22-fs
net0: virtio=B6:25:CA:43:B8:DF,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: win11
scsi0: local-lvm:vm-103-disk-1,discard=on,iothread=1,size=32G,ssd=1
scsi1: pve:vm-103-disk-0,backup=0,discard=on,iothread=1,replicate=0,size=6T,ssd=1
scsi2: pve:vm-103-disk-1,backup=0,discard=on,iothread=1,replicate=0,size=21012G,ssd=1
scsi3: pve:vm-103-disk-2,backup=0,discard=on,iothread=1,replicate=0,size=6T,ssd=1
scsi4: pve:vm-103-disk-3,backup=0,discard=on,iothread=1,replicate=0,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=34362e52-b2ec-4af7-ae8c-3d19c5555588
sockets: 1
tablet: 1
vga: qxl,memory=32
vmgenid: 45d62539-0ca5-47dd-8822-25a093a63e11
Code:
root@pve:~# zfs get volblocksize
NAME PROPERTY VALUE SOURCE
storage volblocksize - -
storage/pve volblocksize - -
storage/pve/vm-103-disk-0 volblocksize 64K -
storage/pve/vm-103-disk-1 volblocksize 8K default
storage/pve/vm-103-disk-2 volblocksize 64K -
storage/pve/vm-103-disk-3 volblocksize 64K -