Hi, I am trialing proxmox on a server with an AMD EPYC 7313P, 256GB RAM and a bunch of enterprise SSDs. proxmox is version 8.3. I have an ubuntu 24.04 server vm setup with 16 cores and 128GB ram on a zfs mirror and was looking at passing through specific NVME disks to the vm.
I followed the pcie passthrough guide on the proxmox manual, all disks have their own IOMMU group and it seems to be passing through the devices. i have set up md raid 6 as a test however the performance i am getting is roughly half what i should be getting when i benchmark it. Wi
I am running the following command within the :
what i get is
I have also tried with RAIDZ2 and get slightly worse performance, but not too far off this.
To rule out the disks, i tried the same (with the same array assmbled) on the host running a live ubuntu 24.04 server and on proxmox shell itself.
On both i installed, fio (also installed mdadm on proxmox) and reassmbled the array. on both of these i get roughly double the bandwidth, example from the proxmox shell is below
is there anything i can do to get the perfomance out of these passthrough disks?
edit: i did a couple more tests
- created a raidz in proxmox
- gave the vm a scsi disk on the raidz
- gave the vm a block disk on the raidz
- ran the benchmarks again but this time with the fio commands listed in the CEPH benchmark PDF but with randwrite instead of just write
resuults were:
- slow md raid in the vm (around 30MB/s)
- virtual block device in the vm was slower (around 20MB/s)
- virtual scsi drive in the vm was even sower (around 15MB/s)
- single drive passthrough drive performance (around 170MB/s)
- raidz from proxmox shell (around 25 MB/s)
- single nvme drive from proxmox shell ( around 270MB/s)
so still would like to know why the passthrough disk is slow.
In the VM
I followed the pcie passthrough guide on the proxmox manual, all disks have their own IOMMU group and it seems to be passing through the devices. i have set up md raid 6 as a test however the performance i am getting is roughly half what i should be getting when i benchmark it. Wi
I am running the following command within the :
Code:
fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=30 --time_based --end_fsync=1
Code:
fio-3.36
Starting 1 process
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=1934: Wed Feb 26 17:42:49 2025
write: IOPS=27.2k, BW=106MiB/s (111MB/s)(3599MiB/33888msec); 0 zone resets
slat (nsec): min=290, max=127712, avg=3629.94, stdev=1047.45
clat (nsec): min=341, max=6047.3k, avg=27719.67, stdev=8519.02
lat (usec): min=13, max=6049, avg=31.35, stdev= 8.84
clat percentiles (usec):
| 1.00th=[ 20], 5.00th=[ 21], 10.00th=[ 22], 20.00th=[ 23],
| 30.00th=[ 27], 40.00th=[ 28], 50.00th=[ 29], 60.00th=[ 30],
| 70.00th=[ 30], 80.00th=[ 31], 90.00th=[ 32], 95.00th=[ 34],
| 99.00th=[ 39], 99.50th=[ 41], 99.90th=[ 48], 99.95th=[ 63],
| 99.99th=[ 130]
bw ( KiB/s): min=69600, max=166784, per=100.00%, avg=125015.84, stdev=17021.67, samples=58
iops : min=17400, max=41696, avg=31253.98, stdev=4255.39, samples=58
lat (nsec) : 500=0.01%
lat (usec) : 10=0.01%, 20=1.32%, 50=98.60%, 100=0.07%, 250=0.01%
lat (usec) : 500=0.01%, 750=0.01%
lat (msec) : 4=0.01%, 10=0.01%
cpu : usr=15.03%, sys=15.43%, ctx=921689, majf=0, minf=26
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,921327,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=106MiB/s (111MB/s), 106MiB/s-106MiB/s (111MB/s-111MB/s), io=3599MiB (3774MB), run=33888-33888msec
Disk stats (read/write):
md5: ios=0/71269, sectors=0/8718896, merge=0/0, ticks=0/5248645, in_queue=5248645, util=29.11%, aggrios=32059/111531, aggsectors=10018186/6578892, aggrmerge=1220321/710840, aggrticks=16767/26859, aggrin_queue=43626, aggrutil=24.38%
nvme0n1: ios=35775/107490, sectors=13224736/3499846, merge=1617502/329996, ticks=18434/22107, in_queue=40540, util=24.07%
nvme3n1: ios=21416/122231, sectors=272232/15801182, merge=12613/1852943, ticks=10950/38091, in_queue=49041, util=24.01%
nvme2n1: ios=35199/108476, sectors=13203920/3508550, merge=1615475/330098, ticks=18971/23929, in_queue=42901, util=24.37%
nvme1n1: ios=35849/107928, sectors=13371856/3505990, merge=1635697/330326, ticks=18713/23309, in_queue=42022, util=24.38%
I have also tried with RAIDZ2 and get slightly worse performance, but not too far off this.
To rule out the disks, i tried the same (with the same array assmbled) on the host running a live ubuntu 24.04 server and on proxmox shell itself.
On both i installed, fio (also installed mdadm on proxmox) and reassmbled the array. on both of these i get roughly double the bandwidth, example from the proxmox shell is below
Code:
fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=30 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=24896: Wed Feb 26 16:34:00 2025
write: IOPS=59.3k, BW=232MiB/s (243MB/s)(8192MiB/35379msec); 0 zone resets
slat (nsec): min=790, max=132992, avg=1003.29, stdev=384.63
clat (usec): min=8, max=11558, avg=10.13, stdev=11.12
lat (usec): min=9, max=11558, avg=11.14, stdev=11.17
clat percentiles (usec):
| 1.00th=[ 9], 5.00th=[ 9], 10.00th=[ 9], 20.00th=[ 10],
| 30.00th=[ 10], 40.00th=[ 10], 50.00th=[ 10], 60.00th=[ 10],
| 70.00th=[ 11], 80.00th=[ 11], 90.00th=[ 13], 95.00th=[ 14],
| 99.00th=[ 15], 99.50th=[ 16], 99.90th=[ 19], 99.95th=[ 29],
| 99.99th=[ 108]
bw ( KiB/s): min=97112, max=384440, per=100.00%, avg=342364.55, stdev=53617.78, samples=49
iops : min=24278, max=96110, avg=85591.12, stdev=13404.44, samples=49
lat (usec) : 10=63.03%, 20=36.90%, 50=0.04%, 100=0.02%, 250=0.01%
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 10=0.01%, 20=0.01%
cpu : usr=8.43%, sys=32.14%, ctx=2148402, majf=3, minf=23
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,2097153,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=232MiB/s (243MB/s), 232MiB/s-232MiB/s (243MB/s-243MB/s), io=8192MiB (8590MB), run=35379-35379msec
Disk stats (read/write):
md5: ios=1/315395, merge=0/0, ticks=0/11423629, in_queue=11423629, util=46.37%, aggrios=120512/244907, aggrmerge=1437231/1073551, aggrticks=29420/39073, aggrin_queue=68493, aggrutil=42.82%
nvme0n1: ios=125186/240097, merge=1898156/641739, ticks=32633/35944, in_queue=68577, util=42.82%
nvme3n1: ios=105733/258734, merge=45666/2368092, ticks=20498/47307, in_queue=67805, util=39.22%
nvme2n1: ios=125330/240471, merge=1896070/642360, ticks=32357/36979, in_queue=69336, util=42.55%
nvme1n1: ios=125801/240327, merge=1909034/642013, ticks=32192/36063, in_queue=68254, util=42.55%
is there anything i can do to get the perfomance out of these passthrough disks?
edit: i did a couple more tests
- created a raidz in proxmox
- gave the vm a scsi disk on the raidz
- gave the vm a block disk on the raidz
- ran the benchmarks again but this time with the fio commands listed in the CEPH benchmark PDF but with randwrite instead of just write
resuults were:
- slow md raid in the vm (around 30MB/s)
- virtual block device in the vm was slower (around 20MB/s)
- virtual scsi drive in the vm was even sower (around 15MB/s)
- single drive passthrough drive performance (around 170MB/s)
- raidz from proxmox shell (around 25 MB/s)
- single nvme drive from proxmox shell ( around 270MB/s)
so still would like to know why the passthrough disk is slow.
In the VM
Last edited: