Proxmox ceph low write iops but good read iops. Why??

bsinha · Jan 3, 2023

Hi,

We are running a 5-node proxmox ceph cluster. Among them 3 of them are having ssd drives which is making a pool called ceph-ssd-pool1. Following are the configuration:

Ceph Network: 10G
SSD drives are of: Kingston SEDC500M/1920G (Which they call it as Datacenter grade SSDs and claiming to get around 98K read and 70K write iops)

My rados benchmark shows Write IOPS around 3K, whereas read IOPS is 20K.

During the benchmark process, CPU does not go significantly high on the servers as well as the 10G ceph network does not take more than 4G traffic.

Why is this difference between the Write IOPS and Read IOPS? I would be more than happy if I get any suggestion on achieving 10K write iops.

Thanks in advanced.

mira · Jan 4, 2023

Most likely the cause is the latency.
For reads Ceph can use all replicas in theory to read the data from, while writes will only be finished when all replicas were written.

If you take a look at the `Average Latency` lines in both screenshots, you can see that writes have almost 10x the latency. This also matches up with the average IOPS (almost 10x for reads).

How is the latency between the nodes?
Do you have one of those SSDs you could test with fio? (This is destructive, so you can't just use one of the OSDs)

bsinha · Jan 4, 2023

Thank you for your reply.

I shall move forward taking into account about the latency. However, Can you please guide me through how can we take care of the latency? Just for your information, we are running a managed switch and configured the ports to handle jumbo frames (9000 mtu).

What would be the fio command to test? and what portion of the result should I take a look at?

mira · Jan 4, 2023

fio --ioengine=psync --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4k --filename=<path/to/device

for 4K writes.

fio --ioengine=psync --direct=1 --sync=1 --rw=write --bs=4M --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4m --filename=<path/to/device

for 4M writes.

The first one benchmarks the IOPS while the second one benchmarks the bandwidth of the disk.
And run the same command lines just with `--rw=read` and `--name read_<BS>` for read benchmarks as well.

Please provide the complete output including the command line used to invoke it.

mira · Jan 4, 2023

Do you see `errors` in the `ip -statistics a` output?
Any dropped or faulty packets can increase the latency of the network.

You can also benchmark the performance with `iperf3`. This will also show the congestion window which can hint at network issues if it gets lowered (a lot).

bsinha · Jan 4, 2023

Thanks once again for your response. Please find the "ip -statistics a" below for all the node servers mentioning the 10G ceph port only

#Node 1 - snap

#Node 2 - snap

#Node 3 -snap

#Node 4 - snap

#Node 5 - snap

Following is the iperf test report

I dont find any unusual thing within the Network. However, the strange thing is, the latency can only be seen during the write operation. What could be the possible reason if you may suggest.

About the fio command that you have suggested, we are getting a new server with the same SSD drives within next 7 days. We can perform the command on that drive then and share you the report since the command is data destructive as you are saying.

In the mean time if you have any other suggestion please let us know.

Thanks.

mira · Jan 5, 2023

Did you see the `Cwnd` lowering from time to time during the iperf tests?

bsinha · Jan 5, 2023

What do you mean by 'Cwnd'?

Following is the output we are getting:

iperf client

Code:

root@host4:~# iperf -c 172.32.0.5 -w 2m -t 30s -i 1
------------------------------------------------------------
Client connecting to 172.32.0.5, TCP port 5001
TCP window size:  416 KByte (WARNING: requested 1.91 MByte)
------------------------------------------------------------
[  3] local 172.32.0.4 port 51906 connected with 172.32.0.5 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3] 0.0000-1.0000 sec  1.01 GBytes  8.64 Gbits/sec
[  3] 1.0000-2.0000 sec  1.00 GBytes  8.62 Gbits/sec
[  3] 2.0000-3.0000 sec  1.01 GBytes  8.71 Gbits/sec
[  3] 3.0000-4.0000 sec  1.00 GBytes  8.61 Gbits/sec
[  3] 4.0000-5.0000 sec  1.01 GBytes  8.70 Gbits/sec
[  3] 5.0000-6.0000 sec  1.01 GBytes  8.65 Gbits/sec
[  3] 6.0000-7.0000 sec  1.04 GBytes  8.94 Gbits/sec
[  3] 7.0000-8.0000 sec  1.05 GBytes  8.98 Gbits/sec
[  3] 8.0000-9.0000 sec  1.02 GBytes  8.80 Gbits/sec
[  3] 9.0000-10.0000 sec  1.02 GBytes  8.77 Gbits/sec
[  3] 10.0000-11.0000 sec  1.04 GBytes  8.95 Gbits/sec
[  3] 11.0000-12.0000 sec  1.02 GBytes  8.79 Gbits/sec
[  3] 12.0000-13.0000 sec  1.05 GBytes  8.98 Gbits/sec
[  3] 13.0000-14.0000 sec  1.04 GBytes  8.90 Gbits/sec
[  3] 14.0000-15.0000 sec  1.03 GBytes  8.83 Gbits/sec
[  3] 15.0000-16.0000 sec  1.03 GBytes  8.84 Gbits/sec
[  3] 16.0000-17.0000 sec  1.04 GBytes  8.89 Gbits/sec
[  3] 17.0000-18.0000 sec  1.03 GBytes  8.86 Gbits/sec
[  3] 18.0000-19.0000 sec  1.00 GBytes  8.61 Gbits/sec
[  3] 19.0000-20.0000 sec  1.01 GBytes  8.70 Gbits/sec
[  3] 20.0000-21.0000 sec  1013 MBytes  8.50 Gbits/sec
[  3] 21.0000-22.0000 sec  1.02 GBytes  8.80 Gbits/sec
[  3] 22.0000-23.0000 sec  1.04 GBytes  8.92 Gbits/sec
[  3] 23.0000-24.0000 sec  1.03 GBytes  8.82 Gbits/sec
[  3] 24.0000-25.0000 sec  1.04 GBytes  8.92 Gbits/sec
[  3] 25.0000-26.0000 sec  1.03 GBytes  8.88 Gbits/sec
[  3] 26.0000-27.0000 sec  1.03 GBytes  8.85 Gbits/sec
[  3] 27.0000-28.0000 sec  1.02 GBytes  8.77 Gbits/sec
[  3] 28.0000-29.0000 sec  1.04 GBytes  8.91 Gbits/sec
[  3] 29.0000-30.0000 sec  1.04 GBytes  8.90 Gbits/sec
[  3] 30.0000-30.0000 sec   256 KBytes  91.2 Gbits/sec
[  3] 0.0000-30.0000 sec  30.7 GBytes  8.80 Gbits/sec
root@host4:~#

iperf server end

Code:

root@host5:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  4] local 172.32.0.5 port 5001 connected with 172.32.0.4 port 51906
[ ID] Interval       Transfer     Bandwidth
[  4] 0.0000-29.9993 sec  30.7 GBytes  8.80 Gbits/sec

mira · Jan 5, 2023

Please use `iperf3` instead of `iperf`. This provides the `Cwnd` output.

bsinha · Jan 5, 2023

I am getting the following output:

Code:

root@host5:~# iperf3 -c 172.32.0.4
Connecting to host 172.32.0.4, port 5201
[  5] local 172.32.0.5 port 60858 connected to 172.32.0.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   915 MBytes  7.68 Gbits/sec   41   1.63 MBytes       
[  5]   1.00-2.00   sec  1.02 GBytes  8.78 Gbits/sec  111   1.82 MBytes       
[  5]   2.00-3.00   sec  1.11 GBytes  9.55 Gbits/sec   10   1.82 MBytes       
[  5]   3.00-4.00   sec   910 MBytes  7.63 Gbits/sec   11   1.84 MBytes       
[  5]   4.00-5.00   sec  1.10 GBytes  9.44 Gbits/sec   16   1.89 MBytes       
[  5]   5.00-6.00   sec  1.08 GBytes  9.29 Gbits/sec   13   1.91 MBytes       
[  5]   6.00-7.00   sec  1.10 GBytes  9.48 Gbits/sec    9   1.93 MBytes       
[  5]   7.00-8.00   sec  1.10 GBytes  9.47 Gbits/sec    0   1.93 MBytes       
[  5]   8.00-9.00   sec  1.11 GBytes  9.56 Gbits/sec    0   1.93 MBytes       
[  5]   9.00-10.00  sec   876 MBytes  7.35 Gbits/sec    8   1.95 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.3 GBytes  8.82 Gbits/sec  219             sender
[  5]   0.00-10.00  sec  10.3 GBytes  8.82 Gbits/sec                  receiver

Does it look good?

Thanks,
Biswajit Sinha

bsinha · Jan 5, 2023

mira said:
fio --ioengine=psync --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4k --filename=<path/to/device for 4K writes.
fio --ioengine=psync --direct=1 --sync=1 --rw=write --bs=4M --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4m --filename=<path/to/device for 4M writes.

The first one benchmarks the IOPS while the second one benchmarks the bandwidth of the disk.
And run the same command lines just with `--rw=read` and `--name read_<BS>` for read benchmarks as well.

Please provide the complete output including the command line used to invoke it.

Hi,

I have been able to run the commands those you have mentioned. Please find the information below:

Code:

fio --ioengine=psync --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4k --filename=/dev/sdb


root@host4:~# fio --ioengine=psync --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4k --filename=/dev/sdb
write_4k: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [W(1)][8.0%][w=26.9MiB/s][w=6875 IOPS][eta 09m:12s]


fio --ioengine=psync --direct=1 --sync=1 --rw=write --bs=4M --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4m --filename=/dev/sdb

root@host4:~# fio --ioengine=psync --direct=1 --sync=1 --rw=write --bs=4M --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4m --filename=/dev/sdb
write_4m: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [W(1)][4.0%][w=404MiB/s][w=101 IOPS][eta 09m:36s]

mira · Jan 11, 2023

Please run both of those tests again with `--rw=read` so we also have the read performance data.

bsinha · Jan 18, 2023

Sorry for the late reply. Please find the results below

Read IOPS while block size = 4k

Code:

root@host4:~# fio --ioengine=psync --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4k --filename=/dev/sdb
write_4k: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
^Cbs: 1 (f=1): [R(1)][6.5%][r=27.4MiB/s][r=7015 IOPS][eta 09m:21s]
fio: terminating on signal 2

Read IOPS while block size = 4M

Code:

root@host4:~# fio --ioengine=psync --direct=1 --sync=1 --rw=read --bs=4M --numjobs=1 --iodepth=1 --runtime=600 --time_based --name read_4m --filename=/dev/sdb
read_4m: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
^Cbs: 1 (f=1): [R(1)][3.2%][r=500MiB/s][r=125 IOPS][eta 09m:41s]
fio: terminating on signal 2

Are we supposed to get that much of iops from this Kingston DC500M 2TB SSD?

mira · Jan 19, 2023

Based on the output you canceled it after 40 and 20 seconds?
This way we can't be sure about the real performance since the cache could have played a big role during those few seconds of the benchmark.

You should let it run the whole 600 seconds and then provide the complete output (including the command).

bsinha · Jan 25, 2023

Thanks for your reply. Please find the outputs below all for Read and Write (for Block Size 4k and 4M both)

#############
READ 4M block size
#############

Code:

root@host4:~# fio --ioengine=psync --direct=1 --sync=1 --rw=read --bs=4M --numjobs=1 --iodepth=1 --runtime=600 --time_based --name read_4m --filename=/dev/sdb
read_4m: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=517MiB/s][r=129 IOPS][eta 00m:00s]
read_4m: (groupid=0, jobs=1): err= 0: pid=1564700: Wed Jan 25 17:06:18 2023
  read: IOPS=128, BW=515MiB/s (540MB/s)(302GiB/600003msec)
    clat (usec): min=7624, max=14794, avg=7767.40, stdev=129.48
     lat (usec): min=7624, max=14795, avg=7767.89, stdev=129.52
    clat percentiles (usec):
     |  1.00th=[ 7635],  5.00th=[ 7635], 10.00th=[ 7701], 20.00th=[ 7701],
     | 30.00th=[ 7701], 40.00th=[ 7701], 50.00th=[ 7701], 60.00th=[ 7767],
     | 70.00th=[ 7767], 80.00th=[ 7832], 90.00th=[ 8029], 95.00th=[ 8029],
     | 99.00th=[ 8160], 99.50th=[ 8160], 99.90th=[ 8455], 99.95th=[ 8586],
     | 99.99th=[ 8979]
   bw (  KiB/s): min=499712, max=541755, per=100.00%, avg=527446.19, stdev=7814.28, samples=1199
   iops        : min=  122, max=  132, avg=128.68, stdev= 1.91, samples=1199
  lat (msec)   : 10=99.99%, 20=0.01%
  cpu          : usr=0.13%, sys=4.29%, ctx=77499, majf=8, minf=6184
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=77194,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=515MiB/s (540MB/s), 515MiB/s-515MiB/s (540MB/s-540MB/s), io=302GiB (324GB), run=600003-600003msec

Disk stats (read/write):
  sdb: ios=588703/0, merge=8426/0, ticks=2525561/0, in_queue=2525561, util=100.00%
root@host4:~#

##############
READ 4k block size
##############

Code:

root@host4:~# fio --ioengine=psync --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4k --filename=/dev/sdb
write_4k: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=27.1MiB/s][r=6942 IOPS][eta 00m:00s]
write_4k: (groupid=0, jobs=1): err= 0: pid=2419880: Wed Jan 25 17:20:00 2023
  read: IOPS=7046, BW=27.5MiB/s (28.9MB/s)(16.1GiB/600001msec)
    clat (usec): min=42, max=4221, avg=139.93, stdev=30.25
     lat (usec): min=42, max=4221, avg=140.12, stdev=30.29
    clat percentiles (usec):
     |  1.00th=[   63],  5.00th=[  110], 10.00th=[  113], 20.00th=[  116],
     | 30.00th=[  125], 40.00th=[  130], 50.00th=[  133], 60.00th=[  143],
     | 70.00th=[  159], 80.00th=[  163], 90.00th=[  172], 95.00th=[  182],
     | 99.00th=[  204], 99.50th=[  215], 99.90th=[  258], 99.95th=[  330],
     | 99.99th=[  930]
   bw (  KiB/s): min=21632, max=71488, per=100.00%, avg=28208.44, stdev=3219.51, samples=1199
   iops        : min= 5408, max=17872, avg=7052.01, stdev=804.89, samples=1199
  lat (usec)   : 50=0.20%, 100=1.02%, 250=98.66%, 500=0.08%, 750=0.01%
  lat (usec)   : 1000=0.02%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=3.20%, sys=7.01%, ctx=4228363, majf=0, minf=34
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=4227796,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=27.5MiB/s (28.9MB/s), 27.5MiB/s-27.5MiB/s (28.9MB/s-28.9MB/s), io=16.1GiB (17.3GB), run=600001-600001msec

Disk stats (read/write):
  sdb: ios=4226931/0, merge=0/0, ticks=550800/0, in_queue=550800, util=100.00%

#####################
Write 4k block size
#####################

Code:

root@host4:~# fio --ioengine=psync --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4k --filename=/dev/sdb
write_4k: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=26.7MiB/s][w=6838 IOPS][eta 00m:00s]
write_4k: (groupid=0, jobs=1): err= 0: pid=3680988: Wed Jan 25 17:32:13 2023
  write: IOPS=6683, BW=26.1MiB/s (27.4MB/s)(15.3GiB/600001msec); 0 zone resets
    clat (usec): min=118, max=250060, avg=147.60, stdev=203.10
     lat (usec): min=118, max=250060, avg=147.89, stdev=203.11
    clat percentiles (usec):
     |  1.00th=[  123],  5.00th=[  127], 10.00th=[  131], 20.00th=[  135],
     | 30.00th=[  137], 40.00th=[  139], 50.00th=[  141], 60.00th=[  143],
     | 70.00th=[  147], 80.00th=[  153], 90.00th=[  169], 95.00th=[  184],
     | 99.00th=[  243], 99.50th=[  273], 99.90th=[  420], 99.95th=[  906],
     | 99.99th=[ 3130]
   bw (  KiB/s): min= 5072, max=31424, per=100.00%, avg=26752.92, stdev=2078.57, samples=1199
   iops        : min= 1268, max= 7856, avg=6688.11, stdev=519.65, samples=1199
  lat (usec)   : 250=99.15%, 500=0.76%, 750=0.02%, 1000=0.03%
  lat (msec)   : 2=0.01%, 4=0.03%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 250=0.01%, 500=0.01%
  cpu          : usr=3.05%, sys=11.84%, ctx=11967441, majf=0, minf=60
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4009872,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=26.1MiB/s (27.4MB/s), 26.1MiB/s-26.1MiB/s (27.4MB/s-27.4MB/s), io=15.3GiB (16.4GB), run=600001-600001msec

Disk stats (read/write):
  sdb: ios=51/8018166, merge=0/0, ticks=7/534457, in_queue=901957, util=100.00%

################
Write 4M block size
#################

Code:

root@host4:~# fio --ioengine=psync --direct=1 --sync=1 --rw=write --bs=4M --numjobs=1 --iodepth=1 --runtime=600 --time_based --name write_4m --filename=/dev/sdb
write_4m: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=408MiB/s][w=102 IOPS][eta 00m:00s]
write_4m: (groupid=0, jobs=1): err= 0: pid=495821: Wed Jan 25 17:44:44 2023
  write: IOPS=101, BW=407MiB/s (426MB/s)(238GiB/600004msec); 0 zone resets
    clat (usec): min=9343, max=48130, avg=9610.79, stdev=1366.01
     lat (usec): min=9412, max=48476, avg=9831.66, stdev=1371.43
    clat percentiles (usec):
     |  1.00th=[ 9372],  5.00th=[ 9372], 10.00th=[ 9503], 20.00th=[ 9503],
     | 30.00th=[ 9503], 40.00th=[ 9503], 50.00th=[ 9503], 60.00th=[ 9503],
     | 70.00th=[ 9503], 80.00th=[ 9634], 90.00th=[ 9765], 95.00th=[ 9765],
     | 99.00th=[10159], 99.50th=[10421], 99.90th=[42730], 99.95th=[45351],
     | 99.99th=[47449]
   bw (  KiB/s): min=114688, max=434176, per=100.00%, avg=416691.45, stdev=25116.59, samples=1199
   iops        : min=   28, max=  106, avg=101.66, stdev= 6.13, samples=1199
  lat (msec)   : 10=98.39%, 20=1.41%, 50=0.20%
  cpu          : usr=2.36%, sys=1.82%, ctx=357653, majf=0, minf=5158
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,60986,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=407MiB/s (426MB/s), 407MiB/s-407MiB/s (426MB/s-426MB/s), io=238GiB (256GB), run=600004-600004msec

Disk stats (read/write):
  sdb: ios=56/314606, merge=0/0, ticks=149/1673538, in_queue=1680054, util=100.00%

mira · Jan 25, 2023

The 4K performance is kind of bad. But the official IOPS numbers are most likely cached.
If you look at the Samsung PM893 SSDs they actually reach the 30k IOPS on writes as advertised. (Same fio 4k benchmark)

bsinha · Jan 30, 2023

Thank you Mira.

We shall open a ticket with Kingston and lets see if we can get help from them

bsinha · Feb 1, 2023

mira said:
The 4K performance is kind of bad. But the official IOPS numbers are most likely cached.
If you look at the Samsung PM893 SSDs they actually reach the 30k IOPS on writes as advertised. (Same fio 4k benchmark)

Just to let you know, we are running Dell PowerEdge R730xd and the disk controller we are using is PERC H730 Mini having the HBA mode enabled for the Kingston drives. This hardware setup is good for getting higher IOPS. Correct? and It is just the disks which we are facing the problem with.

mira · Feb 1, 2023

If you can attach the disks directly to the mainboard you could test it without the HBA.
Usually a HBA should not be an issue if it provides the required speeds.

VictorSTS · Feb 3, 2023

Definitely test with a real HBA instead of a RAID controlled tricked to act as an HBA. I've read reports elsewhere indicating issues like this due to very reduced queue depth when in IT mode. IT mode firmware is usually less optimized/tested than that of a HBA.

Proxmox ceph low write iops but good read iops. Why??

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Distinguished Member

We value your privacy