pveperf disk performance slower than standard RAID

zeref

New Member
Oct 9, 2019
15
0
1
44
Hey,

I tried almost everything out there to get this working, I have major issues to get proxmox up to speed:

8x PM883 Samsung SSDs in H730p RAID10 on bare Windows Server 2016 Installation = 5GB/s reads (e.g) good speeds in general
same as above on bare proxmox Installation = 1GB/s
under zfsraid0
under ext4
pveperf always gives me 1GB/s

In HBA mode ZFSRAID10 = also bad disk numbers

What am I missing here?

Thanks in advance
 
What am I missing here?

Maybe that your non-bare-metal installation probably (partially) reads from some Windows disk cache (i.e., memory), and not from the disks itself? :)

It would be better to use a known good tool like fio for the benchmarks, that one does not lie (albeit it only tests what it asked for)
 
  • Like
Reactions: zeref
First of all, thank you for the quick response.

I still do not know why I am getting such low numbers :confused:

Here are some recent fio benchmarks:

1.PNG
2.PNG
3.PNG
4.PNG
 
In general, enterprise SSDs will perform much, much better if you use a higher I/O depth and much more concurrent threads than only one. Single thread performance of a sequential read on a SATA SSDs is similar to hard disk, everything else is much faster however. Could you please try again first with a higher I/O depth and then more threads?
 
  • Like
Reactions: zeref
In general, enterprise SSDs will perform much, much better if you use a higher I/O depth and much more concurrent threads than only one. Single thread performance of a sequential read on a SATA SSDs is similar to hard disk, everything else is much faster however. Could you please try again first with a higher I/O depth and then more threads?
Bash:
thank you.

root@pve:~#  fio --name=randread --ioengine=libaio --iodepth=16 --rw=randread --bs=4k --direct=0 --size=512M --numjobs=4 --runtime=240 --group_reporting
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.12
Starting 4 processes
randread: Laying out IO file (1 file / 512MiB)
randread: Laying out IO file (1 file / 512MiB)
randread: Laying out IO file (1 file / 512MiB)
randread: Laying out IO file (1 file / 512MiB)
Jobs: 3 (f=1): [f(2),r(1),_(1)][100.0%][r=282MiB/s][r=72.1k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=4): err= 0: pid=30776: Fri Nov  1 12:24:37 2019
  read: IOPS=75.8k, BW=296MiB/s (311MB/s)(2048MiB/6915msec)
    slat (usec): min=3, max=880, avg=40.79, stdev=26.56
    clat (usec): min=2, max=2043, avg=646.07, stdev=287.40
     lat (usec): min=6, max=2142, avg=687.01, stdev=305.63
    clat percentiles (usec):
     |  1.00th=[  111],  5.00th=[  169], 10.00th=[  215], 20.00th=[  306],
     | 30.00th=[  502], 40.00th=[  660], 50.00th=[  717], 60.00th=[  758],
     | 70.00th=[  807], 80.00th=[  873], 90.00th=[  979], 95.00th=[ 1074],
     | 99.00th=[ 1237], 99.50th=[ 1270], 99.90th=[ 1516], 99.95th=[ 1631],
     | 99.99th=[ 1844]
   bw (  KiB/s): min=51336, max=261556, per=27.55%, avg=83542.49, stdev=46815.53, samples=43
   iops        : min=12834, max=65389, avg=20885.40, stdev=11703.94, samples=43
  lat (usec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.20%
  lat (usec)   : 250=14.82%, 500=14.91%, 750=27.69%, 1000=33.99%
  lat (msec)   : 2=8.39%, 4=0.01%
  cpu          : usr=5.40%, sys=94.57%, ctx=68, majf=11, minf=300
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=524288,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):   READ: bw=296MiB/s (311MB/s), 296MiB/s-296MiB/s (311MB/s-311MB/s), io=2048MiB (2147MB), run=6915-6915msec
root@pve:~#  fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [m(1)][97.8%][r=76.1MiB/s,w=24.7MiB/s][r=19.5k,w=6325 IOPS][eta 00m:02s]
test: (groupid=0, jobs=1): err= 0: pid=31970: Fri Nov  1 12:26:20 2019
  read: IOPS=8857, BW=34.6MiB/s (36.3MB/s)(3070MiB/88731msec)
   bw (  KiB/s): min= 4064, max=109088, per=99.39%, avg=35213.98, stdev=18311.70, samples=177
   iops        : min= 1016, max=27272, avg=8803.48, stdev=4577.90, samples=177
  write: IOPS=2960, BW=11.6MiB/s (12.1MB/s)(1026MiB/88731msec); 0 zone resets
   bw (  KiB/s): min= 1504, max=35160, per=99.38%, avg=11767.02, stdev=6084.55, samples=177
   iops        : min=  376, max= 8790, avg=2941.75, stdev=1521.14, samples=177
  cpu          : usr=4.07%, sys=71.52%, ctx=77471, majf=0, minf=1421
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=34.6MiB/s (36.3MB/s), 34.6MiB/s-34.6MiB/s (36.3MB/s-36.3MB/s), io=3070MiB (3219MB), run=88731-88731msec
  WRITE: bw=11.6MiB/s (12.1MB/s), 11.6MiB/s-11.6MiB/s (12.1MB/s-12.1MB/s), io=1026MiB (1076MB), run=88731-88731msec
root@pve:~#  fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.12

Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=30.4MiB/s][w=7778 IOPS][eta 00m:00s]
journal-test: (groupid=0, jobs=1): err= 0: pid=29295: Fri Nov  1 12:28:31 2019
  write: IOPS=7774, BW=30.4MiB/s (31.8MB/s)(1822MiB/60001msec); 0 zone resets
    clat (usec): min=121, max=1561, avg=127.27, stdev=14.78
     lat (usec): min=121, max=1561, avg=127.51, stdev=14.88
    clat percentiles (usec):
     |  1.00th=[  124],  5.00th=[  125], 10.00th=[  125], 20.00th=[  126],
     | 30.00th=[  126], 40.00th=[  127], 50.00th=[  127], 60.00th=[  127],
     | 70.00th=[  128], 80.00th=[  128], 90.00th=[  129], 95.00th=[  130],
     | 99.00th=[  139], 99.50th=[  145], 99.90th=[  174], 99.95th=[  251],
     | 99.99th=[  922]
   bw (  KiB/s): min=30240, max=31496, per=100.00%, avg=31097.67, stdev=258.87, samples=119
   iops        : min= 7560, max= 7874, avg=7774.40, stdev=64.74, samples=119
  lat (usec)   : 250=99.95%, 500=0.01%, 750=0.01%, 1000=0.03%
  lat (msec)   : 2=0.01%
  cpu          : usr=2.89%, sys=12.35%, ctx=1399412, majf=1, minf=26
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,466489,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=30.4MiB/s (31.8MB/s), 30.4MiB/s-30.4MiB/s (31.8MB/s-31.8MB/s), io=1822MiB (1911MB), run=60001-60001msec

Disk stats (read/write):
  sda: ios=86/931158, merge=0/2, ticks=21/53082, in_queue=0, util=100.00%
root@pve:~#
root@pve:~#  fio --filename=/dev/sda --direct=1 --sync=1 --rw=write --bs=4k --numjobs=4 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test
journal-test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.12
Starting 4 processes
Jobs: 4 (f=4): [W(4)][100.0%][w=52.0MiB/s][w=13.6k IOPS][eta 00m:00s]
journal-test: (groupid=0, jobs=4): err= 0: pid=721: Fri Nov  1 12:31:44 2019
  write: IOPS=13.8k, BW=53.8MiB/s (56.4MB/s)(3226MiB/60001msec); 0 zone resets
    clat (usec): min=130, max=1922, avg=288.68, stdev=75.92
     lat (usec): min=130, max=1922, avg=288.99, stdev=75.96
    clat percentiles (usec):
     |  1.00th=[  159],  5.00th=[  200], 10.00th=[  206], 20.00th=[  219],
     | 30.00th=[  235], 40.00th=[  265], 50.00th=[  277], 60.00th=[  297],
     | 70.00th=[  314], 80.00th=[  351], 90.00th=[  392], 95.00th=[  420],
     | 99.00th=[  494], 99.50th=[  523], 99.90th=[  611], 99.95th=[ 1037],
     | 99.99th=[ 1156]
   bw (  KiB/s): min=12696, max=14664, per=24.99%, avg=13763.15, stdev=591.95, samples=476
   iops        : min= 3174, max= 3666, avg=3440.76, stdev=147.97, samples=476
  lat (usec)   : 250=32.94%, 500=66.19%, 750=0.79%, 1000=0.02%
  lat (msec)   : 2=0.06%
  cpu          : usr=1.81%, sys=7.46%, ctx=1816542, majf=0, minf=52
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,825978,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=53.8MiB/s (56.4MB/s), 53.8MiB/s-53.8MiB/s (56.4MB/s-56.4MB/s), io=3226MiB (3383MB), run=60001-60001msec

Disk stats (read/write):
  sda: ios=124/1648535, merge=0/6, ticks=32/219140, in_queue=0, util=100.00%
 
Hi,
which drive are you testing in your last post ? is it and hardware raid ? a single disk ? a zfs raid ?
which disk model ? sata/nvme ?

for your test, always use engine=libaio , and direct=1 (to avoid cached read).
try to write to device directly and not through the filesystem.

alsoif you use iodepth=1,numjob=1 you will always have performance of 1disk.
pm883 are around 10-20k iops 4k for read/write at iodepth=1, so your results don't seem so bad.

https://www.anandtech.com/show/13704/enterprise-ssd-roundup-intel-samsung-memblaze/5
 
  • Like
Reactions: zeref
First of all, thank you very much for your response.

Hi,
which drive are you testing in your last post ?
I think it was RAID10 ZFS HBA Mode.

is it and hardware raid ?
Tried ZFS RAID10 through HBA Mode and RAID10 on Hardware Controller, then ZFS0 and ext4 on proxmox.

a single disk ? a zfs raid ?
I tried both. On proxmox host I get full zfs10 raid performance and on windows vm guest I get about single disk IOPS performance.

which disk model ? sata/nvme ?
Only SATA :(

for your test, always use engine=libaio , and direct=1 (to avoid cached read).
try to write to device directly and not through the filesystem.
Already tried.

alsoif you use iodepth=1,numjob=1 you will always have performance of 1disk.
pm883 are around 10-20k iops 4k for read/write at iodepth=1, so your results don't seem so bad.
Yeah, I guess thats the maximum.

https://www.anandtech.com/show/13704/enterprise-ssd-roundup-intel-samsung-memblaze/5

I guess I will try it with these numbers. I am just worried about MS SQL performance under Windows Server 2016 :(
 
First of all, thank you very much for your response.


I think it was RAID10 ZFS HBA Mode.


Tried ZFS RAID10 through HBA Mode and RAID10 on Hardware Controller, then ZFS0 and ext4 on proxmox.


I tried both. On proxmox host I get full zfs10 raid performance and on windows vm guest I get about single disk IOPS performance.


Only SATA :(


Already tried.


Yeah, I guess thats the maximum.

https://www.anandtech.com/show/13704/enterprise-ssd-roundup-intel-samsung-memblaze/5

I mean, I don't known for your differents logs, which setup is was behind.
Also, it's better to always use same options, to be able to compare results.

I guess I will try it with these numbers. I am just worried about MS SQL performance under Windows Server 2016 :(

If you don't need analysis service, only the database engine, you can install on linux too ;) (I have migrated some big databases, It's faster than windows. And I can do updates without almost any impact).

the low iodepth could impact the journal of the database. (if you have multiple databases with multiple journals, it'll scale).
I have found one big optimisation recently for this problem

https://sqlperformance.com/2014/04/io-subsystem/delayed-durability-in-sql-server-2014
ALTER DATABASE … SET DELAYED_DURABILITY = FORCED

basicly, it's like mysql, it's flushing journal each second, regrouping transaction/small block in bigger block of 64k. (so less io, bigger).
(but you can lost last second of transaction in case of failure, depend on the criticity of you application)
 
I mean, I don't known for your differents logs, which setup is was behind.
Also, it's better to always use same options, to be able to compare results.



If you don't need analysis service, only the database engine, you can install on linux too ;) (I have migrated some big databases, It's faster than windows. And I can do updates without almost any impact).

the low iodepth could impact the journal of the database. (if you have multiple databases with multiple journals, it'll scale).
I have found one big optimisation recently for this problem

https://sqlperformance.com/2014/04/io-subsystem/delayed-durability-in-sql-server-2014
ALTER DATABASE … SET DELAYED_DURABILITY = FORCED

basicly, it's like mysql, it's flushing journal each second, regrouping transaction/small block in bigger block of 64k. (so less io, bigger).
(but you can lost last second of transaction in case of failure, depend on the criticity of you application)

:D I am coming from gentoo, I am trying this for someone else and he insists on MS SQL.
 
  • Like
Reactions: zeref

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!