Passed-through nvme: poor performance on 4k random writes with higher queues

marcosscriven

Member
Mar 6, 2021
139
11
23
I have a Firecuda 530 1TB, which can get ~660 MB/s on 4k random writes with a 32 queue depth.

However, I'm only seeing 260 when the whole disk is passed through to a windows VM. All other figures (including read 4k random, and sequential) are fine.

The VM itself has 12 threads and plenty of memory.

I'm wondering if there's any Qemu/Proxmox settings I'm getting wrong here?
 
Ah, sorry, I'm confusing things - actually I tried three ways:

1) Windows host
2) Proxmox host, with tested drive a vfio scsi drive on the nvme I'm interested in (no caching)
3) Proxmox host, with nvme drive passed through.

In case 1 and 2, 4k random writes with 32 queue depth are good. In the 3rd case they drop markedly.
 
Code:
qm config 104
balloon: 0
bios: ovmf
boot: order=scsi0;net0
cores: 12
cpu: host
efidisk0: local:104/vm-104-disk-1.raw,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:0b:00,pcie=1,x-vga=1
hostpci1: 0000:0d:00.3,rombar=0
hostpci2: 0000:01:00.0,rombar=0
machine: pc-q35-6.2
memory: 16000
meta: creation-qemu=6.2.0,ctime=1653231820
name: windowstest
net0: virtio=22:69:16:E1:CF:1A,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local:104/vm-104-disk-0.qcow2,iothread=1,size=64G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=0cb57991-9674-4e6b-8a1f-970a026fac77
sockets: 1
vmgenid: 79d10fd2-e9e2-4811-bca9-45172af45c01

The drive is hostpci2 - the others are a GPU/GPU audio.
 
iothread requires the scsi single controller. Please try it with that
 
Yes, you're right.

Could you please run numastat and post the output?
Thanks for your help @LnxBil - sorry I had to focus on something else.

The output of numastat:

Code:
numastat
                           node0
numa_hit               122712713
numa_miss                      0
numa_foreign                   0
interleave_hit              3105
local_node             122712713
other_node                     0