Very poor storage performance

enozzac

New Member
Jan 30, 2023
6
0
1
Hi,
I updated proxmox to the latest version available (8.0.4) and noticed that all my virtual machines and containers were slower than before.
I did a bench on the storage (nvme patriot P300) and I was amazed:

Code:
fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/nvme0n1


seq_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=0): [f(1)][100.0%][r=4KiB/s][r=1 IOPS][eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=382583: Fri Nov 10 09:05:12 2023
  read: IOPS=3298, BW=12.9MiB/s (13.5MB/s)(784MiB/60877msec)
    slat (nsec): min=1855, max=151069, avg=11383.41, stdev=21720.27
    clat (nsec): min=590, max=4985.0M, avg=285003.20, stdev=23908203.75
     lat (usec): min=16, max=4985.0k, avg=296.39, stdev=23908.29
    clat percentiles (nsec):
     |  1.00th=[      700],  5.00th=[    14400], 10.00th=[    14528],
     | 20.00th=[    15168], 30.00th=[    15808], 40.00th=[    18560],
     | 50.00th=[    19072], 60.00th=[    56064], 70.00th=[    67072],
     | 80.00th=[    91648], 90.00th=[    96768], 95.00th=[    98816],
     | 99.00th=[   104960], 99.50th=[   113152], 99.90th=[   321536],
     | 99.95th=[  1302528], 99.99th=[926941184]
   bw (  KiB/s): min=    8, max=139216, per=100.00%, avg=28687.00, stdev=33395.65, samples=56
   iops        : min=    2, max=34804, avg=7171.75, stdev=8348.91, samples=56
  lat (nsec)   : 750=1.55%, 1000=0.18%
  lat (usec)   : 2=0.01%, 4=0.07%, 10=0.01%, 20=52.37%, 50=2.80%
  lat (usec)   : 100=39.30%, 250=3.61%, 500=0.03%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%, 2000=0.01%
  lat (msec)   : >=2000=0.01%
  cpu          : usr=5.18%, sys=10.00%, ctx=191387, majf=0, minf=14
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=200810,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=12.9MiB/s (13.5MB/s), 12.9MiB/s-12.9MiB/s (13.5MB/s-13.5MB/s), io=784MiB (823MB), run=60877-60877msec

Disk stats (read/write):
  nvme0n1: ios=200834/318, merge=0/168, ticks=87250/409899, in_queue=504852, util=94.09%

while I was running the benchmark, after 10 seconds, the i/o delay went to 99% and remained there until the end of the bench

this is a pveperf

Code:
CPU BOGOMIPS:      15974.40
REGEX/SECOND:      2738740
HD SIZE:           93.93 GB (/dev/mapper/pve-root)
BUFFERED READS:    68.45 MB/sec
AVERAGE SEEK TIME: 83.71 ms
FSYNCS/SECOND:     0.03
DNS EXT:           16595.37 ms
DNS INT:           1002.20 ms (local)

This PC has always had some storage problems, even when it was just installed I had a fixed 3% I/O delay, but the performance was acceptable, now as soon as you do something it literally "sits down"

Advice?
Where can I start to understand what happened?

My conf :
N5105
16 GB
NVME Patriot P300 512 GB (also tried another model with comparable result)
 
Hi,

BUFFERED READS: 68.45 MB/sec
AVERAGE SEEK TIME: 83.71 ms
FSYNCS/SECOND: 0.03
That's really bad, all three stats.

The disk is a (super-)cheap consumer part, the vendor does not even seem to indicate the type (QLC, TLC, ..) - but I guess QLC.
These disks can get absurdly slow over time - especially with virtualization workloads -, where even spinning rust can be way faster.

So, probably best to invest in a proper datacenter SSD, PLP (power loss protection) is also a good keyword here.
The disk will probably go bad soon anyway I guess, so better switch early too.
 
Same disk, another machine with similar hw

Code:
fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=/dev/nvme0n1
seq_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=237MiB/s][r=60.6k IOPS][eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=110182: Fri Nov 10 10:36:53 2023
  read: IOPS=61.0k, BW=238MiB/s (250MB/s)(14.0GiB/60001msec)
    slat (nsec): min=2214, max=717236, avg=2482.73, stdev=954.75
    clat (nsec): min=691, max=12728k, avg=13320.44, stdev=35786.37
     lat (usec): min=12, max=12746, avg=15.80, stdev=35.81
    clat percentiles (usec):
     |  1.00th=[   13],  5.00th=[   13], 10.00th=[   13], 20.00th=[   13],
     | 30.00th=[   13], 40.00th=[   13], 50.00th=[   13], 60.00th=[   13],
     | 70.00th=[   13], 80.00th=[   13], 90.00th=[   13], 95.00th=[   13],
     | 99.00th=[   17], 99.50th=[   20], 99.90th=[   28], 99.95th=[  231],
     | 99.99th=[ 1614]
   bw (  KiB/s): min=181480, max=259080, per=100.00%, avg=244024.67, stdev=16392.25, samples=119
   iops        : min=45370, max=64770, avg=61006.17, stdev=4098.08, samples=119
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=99.64%, 50=0.28%
  lat (usec)   : 100=0.02%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%
  cpu          : usr=10.86%, sys=30.49%, ctx=3658843, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=3658709,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=238MiB/s (250MB/s), 238MiB/s-238MiB/s (250MB/s-250MB/s), io=14.0GiB (15.0GB), run=60001-60001msec

Disk stats (read/write):
  nvme0n1: ios=3653291/1285, merge=0/1056, ticks=38383/1974, in_queue=40390, util=99.88%

Not super fast, but usable

I think there something wrong with this particular config, be it some bios or proxmox settings (only after update to 8.0.x)
 
Hi,


That's really bad, all three stats.

The disk is a (super-)cheap consumer part, the vendor does not even seem to indicate the type (QLC, TLC, ..) - but I guess QLC.
These disks can get absurdly slow over time - especially with virtualization workloads -, where even spinning rust can be way faster.

So, probably best to invest in a proper datacenter SSD, PLP (power loss protection) is also a good keyword here.
The disk will probably go bad soon anyway I guess, so better switch early too.
I know it's not a suitable disk model, but if I move it to a similar conf it gives me the results I posted above (not good but acceptable for my use)

Pveperf same disk other machine
Code:
CPU BOGOMIPS:      15974.40
REGEX/SECOND:      2663319
HD SIZE:           58.02 GB (/dev/mapper/pve-root)
BUFFERED READS:    529.21 MB/sec
AVERAGE SEEK TIME: 0.07 ms
FSYNCS/SECOND:     812.21
DNS EXT:           74.45 ms
DNS INT:           70.12 ms (xxx.local)
 
Last edited:
fio use in general:
4K sequential read are NOT the best benchmark. Try using 4K or 8K random reads and also bigger blocksize sequential reads. Write tests are also important, but those are VERY invasive.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!