Slow NVME Speeds on Xpenology VM?

ace308

New Member
Jan 25, 2024
7
0
1
Hardware
Ryzen 5 4650g
2400mhz 32gb ram
Rog strix 550i
Nvme: 1x 500gb Samsung Evo 960, 1x 500gb Samsung Evo 970
SSD: 4x Intel DC SSD 4500 480gb
HDD: 4x Seagate 18tb EXOS
LSI Broadcom SAS3008 SCSI Controller Flashed to IT mode (Seagates connected to this)

Hi All, I installed proxmox with an xpenology VM and followed instructions for pcie pass through of 2x NVMEs for read/write cache. When I benchmarked the nvmes in DSM (Xpenology VM), the speeds were quite slow (on par with results below). So I installed Fio and tested the drives from the proxmox shell and found the following...

Basically, shouldnt the NVME be around 2500mbs read speed and SSD around 500mbs read speed? The SSD is almost the same speed as the 18tb spinning drive.

Any guidance on how to get the full speed out of the drives is appreciated.

1x Seagate Exos 18TB drive
Code:
fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sdc
seq_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=77.8MiB/s][r=19.9k IOPS][eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=8969: Sat Feb 17 00:13:16 2024
  read: IOPS=21.0k, BW=81.9MiB/s (85.9MB/s)(4917MiB/60001msec)
    slat (nsec): min=3075, max=101129, avg=6723.79, stdev=508.95
    clat (usec): min=12, max=13789, avg=40.65, stdev=28.52
     lat (usec): min=37, max=13796, avg=47.38, stdev=28.56
    clat percentiles (usec):
     |  1.00th=[   39],  5.00th=[   40], 10.00th=[   40], 20.00th=[   40],
     | 30.00th=[   41], 40.00th=[   41], 50.00th=[   41], 60.00th=[   41],
     | 70.00th=[   41], 80.00th=[   41], 90.00th=[   41], 95.00th=[   42],
     | 99.00th=[   61], 99.50th=[   69], 99.90th=[   90], 99.95th=[  133],
     | 99.99th=[  262]
   bw (  KiB/s): min=71952, max=84752, per=100.00%, avg=84013.45, stdev=2343.49, samples=119
   iops        : min=17988, max=21188, avg=21003.36, stdev=585.87, samples=119
  lat (usec)   : 20=0.01%, 50=98.77%, 100=1.16%, 250=0.06%, 500=0.01%
  lat (usec)   : 750=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
  cpu          : usr=2.06%, sys=14.10%, ctx=2517236, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1258634,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=81.9MiB/s (85.9MB/s), 81.9MiB/s-81.9MiB/s (85.9MB/s-85.9MB/s), io=4917MiB (5155MB), run=60001-60001msec

Disk stats (read/write):
  sdc: ios=1255944/0, merge=0/0, ticks=54788/0, in_queue=54788, util=99.87%





1x Samsung NVME (both were similar speeds)
Code:
root@pve:~#  fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/nvme1n1
seq_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=249MiB/s][r=63.6k IOPS][eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=9955: Sat Feb 17 00:20:01 2024
  read: IOPS=63.7k, BW=249MiB/s (261MB/s)(14.6GiB/60001msec)
    slat (nsec): min=2034, max=46598, avg=2156.34, stdev=128.90
    clat (nsec): min=561, max=6009.7k, avg=13256.82, stdev=3782.20
     lat (usec): min=12, max=6056, avg=15.41, stdev= 3.81
    clat percentiles (nsec):
     |  1.00th=[12480],  5.00th=[12736], 10.00th=[12864], 20.00th=[13120],
     | 30.00th=[13120], 40.00th=[13248], 50.00th=[13248], 60.00th=[13376],
     | 70.00th=[13376], 80.00th=[13504], 90.00th=[13504], 95.00th=[13632],
     | 99.00th=[13888], 99.50th=[14528], 99.90th=[17792], 99.95th=[21120],
     | 99.99th=[23936]
   bw (  KiB/s): min=253440, max=255384, per=100.00%, avg=254850.02, stdev=404.02, samples=119
   iops        : min=63360, max=63846, avg=63712.50, stdev=101.01, samples=119
  lat (nsec)   : 750=0.01%
  lat (usec)   : 4=0.01%, 10=0.01%, 20=99.91%, 50=0.08%, 100=0.01%
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%
  lat (msec)   : 4=0.01%, 10=0.01%
  cpu          : usr=7.65%, sys=29.71%, ctx=3821380, majf=0, minf=13
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=3821360,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=249MiB/s (261MB/s), 249MiB/s-249MiB/s (261MB/s-261MB/s), io=14.6GiB (15.7GB), run=60001-60001msec

Disk stats (read/write):
  nvme1n1: ios=3812972/0, merge=0/0, ticks=40493/0, in_queue=40493, util=99.87%




Intel DC SSD 4500
Code:
root@pve:~#  fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/sdh
seq_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=98.7MiB/s][r=25.3k IOPS][eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=10538: Sat Feb 17 00:24:23 2024
  read: IOPS=28.3k, BW=110MiB/s (116MB/s)(6628MiB/60001msec)
    slat (nsec): min=3116, max=243376, avg=6996.83, stdev=563.96
    clat (nsec): min=611, max=539131, avg=28039.54, stdev=5294.93
     lat (usec): min=28, max=556, avg=35.04, stdev= 5.45
    clat percentiles (usec):
     |  1.00th=[   23],  5.00th=[   23], 10.00th=[   23], 20.00th=[   23],
     | 30.00th=[   24], 40.00th=[   24], 50.00th=[   31], 60.00th=[   32],
     | 70.00th=[   33], 80.00th=[   33], 90.00th=[   33], 95.00th=[   34],
     | 99.00th=[   35], 99.50th=[   35], 99.90th=[   39], 99.95th=[   46],
     | 99.99th=[  219]
   bw (  KiB/s): min=100528, max=133344, per=100.00%, avg=113255.08, stdev=15000.72, samples=119
   iops        : min=25132, max=33336, avg=28313.78, stdev=3750.20, samples=119
  lat (nsec)   : 750=0.01%
  lat (usec)   : 4=0.01%, 10=0.01%, 20=0.02%, 50=99.95%, 100=0.01%
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%
  cpu          : usr=2.67%, sys=18.21%, ctx=3393464, majf=0, minf=13
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1696729,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=110MiB/s (116MB/s), 110MiB/s-110MiB/s (116MB/s-116MB/s), io=6628MiB (6950MB), run=60001-60001msec

Disk stats (read/write):
  sdh: ios=1693192/770, merge=0/1, ticks=52906/44, in_queue=52950, util=99.89%
 
Last edited:
You normally don't benchmark sequential read with 4K, this numbers are NOT real life use cases. Try at least random 4K read (randread), or 4K random read/write (beware of the write!).
 
2x NVMEs for read/write cache
I would also bet that the Intel SATA SSDs got a better write performance than your Samsung NVMe SSDs. At least when doing sync writes or after the SLC-cache gets full. Then it would be a bad idea to use them as a write cache.
 
Last edited:
Ok, i've been running various benchmarks through the proxmox console but dont want to post all of them... The easiest to read/copy commands is here: https://docs.oracle.com/en-us/iaas/Content/Block/References/samplefiocommandslinux.htm - based on these speed tests which test would you run for the nvme? In one of the random read/write tests, it starts out fast (1600mbs read and write) does slow down to about 600mbs read/write speed after a minute. (SSD was solid at about 150mbs read/write speed all the way through and did not slow down)

I realize now I don't fully know what type of disk benchmark the Synology DSM does or what parameters (4k file size?) I wonder if it does a read test on its own then a separate write test. Or I assume its a 'throughput - test random/read & write'. I ran the synology disk benchmark again and it takes about 20 to 25 minutes to run, so im not sure what kind of test its running. Does anyone know the parameters I can use to replicate using FIO on the proxmox side?

Using the Xpenology VM using manual sata passthrough command, intel dc ssd 4500 is formatted/partitioned by the VM as a storage pool in a SHR1 configuration with 2 disks. I disabled one of the disks and ran the Synology DSM drive benchmark with the following results (which I find are pretty good actually)
intelDCssd4500.JPG

The 2x NVME drives using full PCIE passthrough of each NVME controller (not scsi or sata manual passthrough), as a read/write cache, I disabled one of the nvmes and ran the benchmark in cache mode but it does not give the throughput or write performance so I removed it from the cache and ran it again.

nvme3.JPG


As you can see the Intel DC 4500 SSD is on par with the Samsung Evo 970 NVME, where the NVME should be about 8 to 10 times faster IOPS and at least double the read/write speed. So there is something happening on the NVME within the VM that slow it down. I assumed when it gets directly passed through to the VM, it would get better speeds than this.

I tried doubling the RAM on the VM and a few other things but it didnt help. Any idea on how to get better performance on the NVMEs with this setup? In one of the read tests it reached 600k IOPS!
 

Attachments

  • nvme.JPG
    nvme.JPG
    66.6 KB · Views: 2
  • nvme2.JPG
    nvme2.JPG
    66.2 KB · Views: 2
Consumer SSDs are designed for short bursts of async writes. The advertised throughput of GBs per second is the performance when writing to cache. Once the DRAM-cache is full it will drastically drop in performance and later again once the SLC-cache is full. Real write throughpout of a NVMe SSD to TLC/QLC NAND you will only see when doing continuous writes and then its more on the level of a good old SATA SSD.
 
Thanks @Dunuin, I understood this concept but did not see it in action until running these benchmarks. As the test runs, the NVME slows by 2/3 (1800mbs to 500mbs). Still quite fast but I built this system to leverage dual nvme slots for read/write to reduce load on my HDDs. The HDD array gets about 700mbs read which is good but I want to use my NAS for video editing/photo editing as I have about 30TB of media to sort, edit, export etc and leveraging the NVMEs for writing small file data while working, I thought would be better to be working on this from the cache than directly on the HDDs. These intel DC series ssds are actually quite awesome in terms of being able to run them at 100% load (even though I am not going to do that). Maybe I should have built a system with more SATA ports instead of 2 NVMEs but I think for my purpose these should work fine as I am not running them in an enterprise environment and they will most likely just be running in short bursts of speeds (when opening a project or saving the project etc).

When I have time I will keep looking at how synology VM does its disk benchmark and compare it on the Proxmox side.

Thanks again for the info, as a Proxmox newbie this forum as been an amazing help!
 
These intel DC series ssds are actually quite awesome in terms of being able to run them at 100% load (even though I am not going to do that).
Because those Intels are Enterprise SSD designed for doing continuous writes without dropping too much performance.

Maybe I should have built a system with more SATA ports instead of 2 NVMEs
Would have been best to buy Enterprise NVMes if you care about write performance. There you get the best of both worlds. something like a Micron 7400 MAX.
 
Last edited:
Thanks, can you recommend some others? I'm trying to find a list but only finding consumer stuff or nvmes that are irrelevant to my build (under 500gb, pcie slot etc)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!