Another CEPH Performance Problem

MasterTH

Renowned Member
Jun 12, 2009
229
7
83
www.sonog.de
Hi,

i know there are many of ceph problems struggling with performance here, and i thought i got it but things are telling me a better story...

Here is our setup:
9 PVE-Nodes with each minimum 128GB RAM, 12Core Processor
2x HDDs or SSDs (newer Server have got SSDs) used for ZFS-R1 for the System itself. Nothing is stored there, just the system
8 Servers are used for ceph, each one of them has 8 - 4TB SSDs combined to one big pool.
ceph network has a 2x10gbs background for fast recovery and fast data delivery

i thought we should not face any performance problems with this setup, but since the recent upgrade we are struggling with performance inside the vms

i started a fio with direct=1 and a filesize of 20mb:
Code:
fio --rw=write --name=test --size=20M --direct=1
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 20MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=80KiB/s][w=20 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=18002: Wed May  5 12:07:20 2021
  write: IOPS=18, BW=75.8KiB/s (77.7kB/s)(20.0MiB/270024msec); 0 zone resets
    clat (msec): min=6, max=1130, avg=52.73, stdev=128.65
     lat (msec): min=6, max=1130, avg=52.73, stdev=128.65
    clat percentiles (msec):
     |  1.00th=[    7],  5.00th=[    7], 10.00th=[    8], 20.00th=[    9],
     | 30.00th=[   11], 40.00th=[   12], 50.00th=[   14], 60.00th=[   16],
     | 70.00th=[   18], 80.00th=[   23], 90.00th=[   55], 95.00th=[  405],
     | 99.00th=[  584], 99.50th=[  718], 99.90th=[  986], 99.95th=[ 1083],
     | 99.99th=[ 1133]
   bw (  KiB/s): min=    7, max=  552, per=100.00%, avg=80.54, stdev=120.46, samples=508
   iops        : min=    1, max=  138, avg=20.09, stdev=30.12, samples=508
  lat (msec)   : 10=29.49%, 20=44.63%, 50=15.76%, 100=0.72%, 250=1.19%
  lat (msec)   : 500=6.37%, 750=1.41%, 1000=0.33%
  cpu          : usr=0.01%, sys=0.10%, ctx=5487, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,5120,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=75.8KiB/s (77.7kB/s), 75.8KiB/s-75.8KiB/s (77.7kB/s-77.7kB/s), io=20.0MiB (20.0MB), run=270024-270024msec

Disk stats (read/write):
  vda: ios=0/7090, merge=0/1204, ticks=0/731652, in_queue=450144, util=55.69%


Awfull 18IOPS, for a 59 SSD backed ceph, isn't it? i cannot find the issue or the config that i have to update.
If you need any further information, please let me know i'll provide it as fast as i can.


thanks
 
Hmm, do you see a / some monitors using up a lot of CPU time?
 
hi aaron,

thanks for your reply.

3 mons, each has a CPU-Usage at about 10%
Load Average on all three server about at: 1.25,1.69,1.72
 
Okay. I am not sure yet what could the cause. On the Ceph user mailing list there is a thread ongoing with that symptomatic.
 
and we're at 15.2.11
that is true... ;)

hmm. what performance do you get if you run a rados benchmark? Taking away the virtualization layer to narrow down where the problem is.
e.g. rados bench 600 write -b 4M -t 16 -p <pool>
 
Code:
rados bench 600 write -b 4M -t 16 -p test
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 600 seconds or 0 objects
Object prefix: benchmark_data_pve5-1_589524
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        58        42    167.99       168   0.0791799    0.289669
    2      16       103        87   173.987       180     0.10247    0.296207
    3      16       131       115    153.32       112     1.00319    0.339364
    4      16       184       168   167.985       212   0.0375354    0.354619
    5      16       224       208   166.386       160    0.078034    0.348956
    6      16       241       225   149.987        68    0.418496    0.342787
    7      16       263       247    141.13        88   0.0783612     0.33109
    8      16       311       295   147.486       192    0.119154    0.398319
    9      16       357       341   151.542       184    0.155564    0.384011
   10      16       369       353   141.187        48    0.791041    0.391886
   11      16       402       386    140.35       132    0.100519    0.448899
   12      16       444       428   142.653       168    0.127638    0.426751
   13      16       451       435   133.834        28    0.351264    0.425493
   14      16       481       465   132.845       120   0.0669084    0.465422
   15      16       517       501   133.587       144    0.106377    0.461657
   16      16       551       535   133.737       136   0.0543049    0.452371
   17      16       590       574   135.046       156   0.0850017    0.454999
   18      16       624       608   135.098       136   0.0471699    0.444734
   19      16       654       638   134.303       120   0.0562947    0.469835
2021-05-05T14:31:04.727632+0200 min lat: 0.033723 max lat: 4.35553 avg lat: 0.467587
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   20      16       671       655   130.987        68      1.5871    0.467587
   21      16       676       660   125.702        20     2.08203    0.469724
   22      16       710       694   126.169       136   0.0719349    0.495984
   23      16       742       726   126.248       128    0.215642     0.48888
   24      16       778       762   126.987       144   0.0462942    0.488285
   25      16       792       776   124.147        56      0.5013     0.49603
   26      16       822       806   123.987       120   0.0439495    0.498529
   27      16       857       841    124.58       140   0.0995914    0.504528
   28      16       890       874   124.845       132    0.399928    0.496412
   29      16       916       900   124.125       104   0.0893293     0.49808
   30      16       943       927   123.588       108   0.0639874    0.493555
   31      16       981       965   124.503       152   0.0976412    0.511521
   32      16      1018      1002   125.237       148   0.0611531    0.499672
   33      16      1053      1037   125.684       140   0.0954047     0.49421
   34      16      1073      1057    124.34        80    0.698673     0.50061
   35      16      1112      1096   125.244       156   0.0456987    0.498683
   36      16      1155      1139   126.543       172    0.785304    0.494532
   37      16      1184      1168   126.257       116   0.0946556    0.492801
   38      16      1210      1194   125.671       104   0.0599735    0.505017
   39      16      1246      1230   126.141       144    0.108777    0.495729
2021-05-05T14:31:24.729831+0200 min lat: 0.0335074 max lat: 4.35553 avg lat: 0.496134
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   40      16      1265      1249   124.887        76   0.0570307    0.496134
   41      16      1292      1276   124.475       108    0.178778    0.501959
   42      16      1319      1303   124.083       108   0.0946214    0.501263
   43      16      1350      1334    124.08       124     1.96131    0.510216
   44      16      1380      1364   123.987       120    0.951497    0.510573
   45      16      1408      1392   123.721       112   0.0501954    0.512396
   46      16      1430      1414   122.944        88    0.232482    0.509369
   47      16      1481      1465   124.668       204    0.403624    0.505455
   48      16      1498      1482   123.487        68    0.576048    0.502592

aborted it after 48 seconds

if i can get this inside our VMs that would be awesome ;)
 
with this hardware specs i set it up with 1024pgs is that maybe the problem?
Should not affect the performance like that.

If you check the output of ceph osd df tree, you should have around 100PGs per OSD as rule of thumb.

Were the VMs kept running during the upgrade (live migration) or did they do a clean start with the Ceph updates?

Can you show the config of one such VM? qm config <vmid> (anonymize what you need)
Is the RBD storage using KRBD or not?
 
Should not affect the performance like that.

If you check the output of ceph osd df tree, you should have around 100PGs per OSD as rule of thumb.

Were the VMs kept running during the upgrade (live migration) or did they do a clean start with the Ceph updates?

Can you show the config of one such VM? qm config <vmid> (anonymize what you need)
Is the RBD storage using KRBD or not?
i see 64 as maximum.

i migrated most of the vms, but this special one has to get some extra memory so i shut it down, moved it to another already upgraded server and started it there.

it is RBP

Code:
bootdisk: virtio0
cores: 8
description: BLABLABLABLA
ide2: none,media=cdrom
memory: 16384
name: nextcloud
net0: virtio=42:ED:67:B7:99:94,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
parent: a1
scsihw: virtio-scsi-pci
smbios1: uuid=2e9bf1a1-0767-4eba-8637-b37613585d66
sockets: 1
virtio0: pve5-ceph:vm-128-disk-0,size=100G
vmgenid: 4d974242-2f05-4e6f-ba0d-90d20d2a82d8

i just saw that the vm has a snapshot. i removed it and rerun fio (just wanted to test if the issue coming from the snapshot):
Code:
fio --rw=write --name=test --size=20M --direct=1
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=52KiB/s][w=13 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=29955: Wed May  5 14:51:47 2021
  write: IOPS=25, BW=103KiB/s (106kB/s)(20.0MiB/198346msec); 0 zone resets
    clat (msec): min=5, max=724, avg=38.73, stdev=79.26
     lat (msec): min=5, max=724, avg=38.73, stdev=79.26
    clat percentiles (msec):
     |  1.00th=[    7],  5.00th=[    9], 10.00th=[    9], 20.00th=[   11],
     | 30.00th=[   12], 40.00th=[   13], 50.00th=[   15], 60.00th=[   16],
     | 70.00th=[   19], 80.00th=[   25], 90.00th=[   75], 95.00th=[  209],
     | 99.00th=[  430], 99.50th=[  493], 99.90th=[  609], 99.95th=[  642],
     | 99.99th=[  726]
   bw (  KiB/s): min=    7, max=  448, per=100.00%, avg=103.03, stdev=115.41, samples=396
   iops        : min=    1, max=  112, avg=25.71, stdev=28.84, samples=396
  lat (msec)   : 10=17.46%, 20=55.88%, 50=16.11%, 100=0.98%, 250=6.17%
  lat (msec)   : 500=3.01%, 750=0.39%
  cpu          : usr=0.02%, sys=0.10%, ctx=6081, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,5120,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=103KiB/s (106kB/s), 103KiB/s-103KiB/s (106kB/s-106kB/s), io=20.0MiB (20.0MB), run=198346-198346msec

Disk stats (read/write):
  vda: ios=0/9124, merge=0/2538, ticks=0/497261, in_queue=365372, util=69.97%


the performance is like this an all of the other vms too..
 
did read somewhere that it may has to do something with the virtio driver i'm using.
added a scsi0 device and retested:
Code:
fio --rw=write --name=test --size=20M --direct=1
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 20MiB)
Jobs: 1 (f=1): [W(1)][99.6%][w=320KiB/s][w=80 IOPS][eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=31129: Wed May  5 14:59:01 2021
  write: IOPS=22, BW=91.3KiB/s (93.5kB/s)(20.0MiB/224307msec); 0 zone resets
    clat (msec): min=5, max=703, avg=43.80, stdev=75.48
     lat (msec): min=5, max=703, avg=43.80, stdev=75.48
    clat percentiles (msec):
     |  1.00th=[    7],  5.00th=[    8], 10.00th=[    9], 20.00th=[   11],
     | 30.00th=[   12], 40.00th=[   13], 50.00th=[   15], 60.00th=[   16],
     | 70.00th=[   18], 80.00th=[   25], 90.00th=[  197], 95.00th=[  211],
     | 99.00th=[  313], 99.50th=[  330], 99.90th=[  502], 99.95th=[  609],
     | 99.99th=[  701]
   bw (  KiB/s): min=    7, max=  472, per=100.00%, avg=91.15, stdev=117.54, samples=447
   iops        : min=    1, max=  118, avg=22.74, stdev=29.39, samples=447
  lat (msec)   : 10=19.16%, 20=56.68%, 50=8.85%, 100=0.98%, 250=12.15%
  lat (msec)   : 500=2.07%, 750=0.12%
  cpu          : usr=0.03%, sys=0.12%, ctx=5170, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,5120,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=91.3KiB/s (93.5kB/s), 91.3KiB/s-91.3KiB/s (93.5kB/s-93.5kB/s), io=20.0MiB (20.0MB), run=224307-224307msec

Disk stats (read/write):
  sda: ios=0/5278, merge=0/63, ticks=0/247054, in_queue=170320, util=67.53%
 
What if you run FIO the following way?

Code:
fio --ioengine=psync --filename=/path/to/file --size=9G --time_based --name=fio --group_reporting --runtime=600 --direct=1 --sync=1 --rw=write --bs=4M --numjobs=4 --iodepth=32
Taken from the Ceph Benchmark paper.

Is the KRBD option enabled or disabled for that storage?
 
Last edited:
krbd disabled

Code:
fio --ioengine=psync --filename=/root/test --size=1G --time_based --name=fio --group_reporting --runtime=60 --direct=1 --sync=1 --rw=write --bs=4M --numjobs=4 --iodepth=32
fio: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=32
...
fio-3.12
Starting 4 processes
fio: Laying out IO file (1 file / 1024MiB)
Jobs: 4 (f=4): [W(4)][18.8%][w=4096KiB/s][w=1 IOPS][eta 04m:24s]
fio: (groupid=0, jobs=4): err= 0: pid=13955: Wed May  5 15:40:51 2021
  write: IOPS=3, BW=12.9MiB/s (13.6MB/s)(792MiB/61282msec); 0 zone resets
    clat (msec): min=88, max=3285, avg=1228.15, stdev=638.21
     lat (msec): min=88, max=3285, avg=1228.33, stdev=638.21
    clat percentiles (msec):
     |  1.00th=[  131],  5.00th=[  372], 10.00th=[  439], 20.00th=[  584],
     | 30.00th=[  877], 40.00th=[  995], 50.00th=[ 1099], 60.00th=[ 1284],
     | 70.00th=[ 1485], 80.00th=[ 1770], 90.00th=[ 2198], 95.00th=[ 2467],
     | 99.00th=[ 2802], 99.50th=[ 3272], 99.90th=[ 3272], 99.95th=[ 3272],
     | 99.99th=[ 3272]
   bw (  KiB/s): min= 8175, max=16384, per=64.21%, avg=8497.20, stdev=1558.55, samples=187
   iops        : min=    1, max=    4, avg= 1.99, stdev= 0.46, samples=187
  lat (msec)   : 100=0.51%, 250=0.51%, 500=12.63%, 750=10.61%, 1000=16.16%
  cpu          : usr=0.01%, sys=0.02%, ctx=869, majf=0, minf=43
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,198,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=12.9MiB/s (13.6MB/s), 12.9MiB/s-12.9MiB/s (13.6MB/s-13.6MB/s), io=792MiB (830MB), run=61282-61282msec

Disk stats (read/write):
  vda: ios=3/1396, merge=0/645, ticks=3/259030, in_queue=161124, util=61.68%
 
Also really not great.

Can you try it with KRBD? You will need to shut down and cold boot the VM after you enabled KRBD for the storage.
 
do i have to change anything in the config of the vm?


result after enabling krbd, stop and start of the vm :
Code:
 fio --ioengine=psync --filename=/root/test --size=1G --time_based --name=fio --group_reporting --runtime=60 --direct=1 --sync=1 --rw=write --bs=4M --numjobs=4 --iodepth=32
fio: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=32
...
fio-3.12
Starting 4 processes
Jobs: 4 (f=4): [W(4)][100.0%][w=16.0MiB/s][w=4 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=4): err= 0: pid=938: Wed May  5 16:19:37 2021
  write: IOPS=3, BW=14.9MiB/s (15.6MB/s)(904MiB/60799msec); 0 zone resets
    clat (msec): min=141, max=3340, avg=1072.27, stdev=641.70
     lat (msec): min=141, max=3340, avg=1072.43, stdev=641.70
    clat percentiles (msec):
     |  1.00th=[  215],  5.00th=[  266], 10.00th=[  305], 20.00th=[  558],
     | 30.00th=[  793], 40.00th=[  818], 50.00th=[  894], 60.00th=[  978],
     | 70.00th=[ 1167], 80.00th=[ 1636], 90.00th=[ 1989], 95.00th=[ 2433],
     | 99.00th=[ 2836], 99.50th=[ 3171], 99.90th=[ 3339], 99.95th=[ 3339],
     | 99.99th=[ 3339]
   bw (  KiB/s): min= 8175, max=16384, per=57.98%, avg=8827.21, stdev=2196.45, samples=206
   iops        : min=    1, max=    4, avg= 2.11, stdev= 0.54, samples=206
  lat (msec)   : 250=3.10%, 500=15.49%, 750=5.31%, 1000=39.38%
  cpu          : usr=0.02%, sys=0.02%, ctx=969, majf=0, minf=42
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,226,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=14.9MiB/s (15.6MB/s), 14.9MiB/s-14.9MiB/s (15.6MB/s-15.6MB/s), io=904MiB (948MB), run=60799-60799msec

Disk stats (read/write):
  vda: ios=65/1248, merge=0/189, ticks=159/195862, in_queue=127200, util=63.52%
 
We are having a simaler problem with the 3.8tb microns in our cluster they have slowed down on writes and is reflected in the vms. We have 1.9tb intel's in there and they are fine. , CPU memory etc are all good and the client load is very low.

We have done a lot of tests and moved some of the 3.8tb to our lab and they work fine there.

We tested in production with
ceph tell osd.0 bench and found the IOPs drop to 30 and bandwidth is 120mb/s . In the test system we can get 125 IOPs and 495 mb/s. The intel's are showing this and they are all over 50 percent full.

Next we added a new 3.8 tested fine then reweight so it filled up and the test was fine. Old ones were still bad. Then we removed an old one and readded and re weighted and it's back to normal.

So we don't know the answer we can only assume that since the drives had things written and deleted over time the drive controller in the ssd has trouble finding free 4mb blocks and when we removed and re-added the data is written in as more orderly fashion.

Try the ceph tell osd.0 bench and see if the same thing is happening
 
  • Like
Reactions: aaron
Hi, do you really --sync=1 ?
because you are going to fsync each write, so you'll have latency impact. (network latency + cpu time client latency + cpu time ceph server )

on my 60 nvme cluster (datacenter)

here some test with qemu + librbd + cache=writeback

seq 4M write with sync (300 MB/S)
Code:
fio --ioengine=libaio --filename=/root/test --size=1G --time_based --name=fio --runtime=60 --direct=1 --sync=1 --rw=write --bs=4M --iodepth=32
fio: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=32
fio-3.12
Starting 1 process
^Cbs: 1 (f=1): [W(1)][18.3%][w=324MiB/s][w=81 IOPS][eta 00m:49s]
fio: terminating on signal 2

fio: (groupid=0, jobs=1): err= 0: pid=11673: Thu May  6 03:55:35 2021
  write: IOPS=79, BW=317MiB/s (332MB/s)(3548MiB/11194msec); 0 zone resets
    slat (usec): min=194, max=302240, avg=12584.07, stdev=12354.13
    clat (msec): min=15, max=655, avg=384.61, stdev=63.06
     lat (msec): min=24, max=664, avg=397.20, stdev=63.45
    clat percentiles (msec):
     |  1.00th=[  122],  5.00th=[  347], 10.00th=[  359], 20.00th=[  363],
     | 30.00th=[  368], 40.00th=[  376], 50.00th=[  380], 60.00th=[  384],
     | 70.00th=[  393], 80.00th=[  397], 90.00th=[  409], 95.00th=[  506],
     | 99.00th=[  592], 99.50th=[  609], 99.90th=[  659], 99.95th=[  659],
     | 99.99th=[  659]
   bw (  KiB/s): min=16384, max=360448, per=96.48%, avg=313127.32, stdev=73996.59, samples=22

seq 4M write without sync (800MB/S ) proxmox node client network satured)
Code:
fio --ioengine=libaio --filename=/root/test --size=1G --time_based --name=fio --runtime=60 --direct=1 --rw=write --bs=4M --iodepth=32
fio: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=32
fio-3.12
Starting 1 process
^Cbs: 1 (f=1): [W(1)][16.7%][w=813MiB/s][w=203 IOPS][eta 00m:50s]
fio: terminating on signal 2

fio: (groupid=0, jobs=1): err= 0: pid=11934: Thu May  6 03:56:42 2021
  write: IOPS=195, BW=781MiB/s (819MB/s)(8060MiB/10325msec); 0 zone resets
    slat (usec): min=182, max=77174, avg=4461.77, stdev=6604.12
    clat (msec): min=9, max=282, avg=159.37, stdev=35.90
     lat (msec): min=9, max=285, avg=163.84, stdev=36.33
    clat percentiles (msec):
     |  1.00th=[   88],  5.00th=[  116], 10.00th=[  124], 20.00th=[  133],
     | 30.00th=[  138], 40.00th=[  144], 50.00th=[  153], 60.00th=[  161],
     | 70.00th=[  171], 80.00th=[  186], 90.00th=[  211], 95.00th=[  232],
     | 99.00th=[  264], 99.50th=[  268], 99.90th=[  279], 99.95th=[  279],
     | 99.99th=[  284]

randwrite 4k with sync (6000 iops)

Code:
fio --ioengine=libaio --filename=/root/test --size=1G --time_based --name=fio --runtime=60 --direct=1 --sync=1 --rw=randwrite --bs=4k --iodepth=32
fio: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.12
Starting 1 process
^Cbs: 1 (f=1): [w(1)][15.0%][w=25.0MiB/s][w=6645 IOPS][eta 00m:51s]
fio: terminating on signal 2

fio: (groupid=0, jobs=1): err= 0: pid=12388: Thu May  6 03:58:56 2021
  write: IOPS=6843, BW=26.7MiB/s (28.0MB/s)(254MiB/9487msec); 0 zone resets
    slat (usec): min=3, max=9830, avg=21.31, stdev=181.87
    clat (usec): min=817, max=34379, avg=4649.85, stdev=2114.25
     lat (usec): min=836, max=34402, avg=4672.24, stdev=2123.58
    clat percentiles (usec):
     |  1.00th=[ 1958],  5.00th=[ 2474], 10.00th=[ 2769], 20.00th=[ 3163],
     | 30.00th=[ 3490], 40.00th=[ 3818], 50.00th=[ 4146], 60.00th=[ 4555],
     | 70.00th=[ 5145], 80.00th=[ 5866], 90.00th=[ 6980], 95.00th=[ 8225],
     | 99.00th=[12518], 99.50th=[14222], 99.90th=[18744], 99.95th=[23725],
     | 99.99th=[34341]

randwrite 4k without sync (48k iops)

Code:
fio --ioengine=libaio --filename=/root/test --size=1G --time_based --name=fio --runtime=60 --direct=1 --rw=randwrite --bs=4k --iodepth=32
fio: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.12
Starting 1 process
^Cbs: 1 (f=1): [w(1)][15.0%][w=184MiB/s][w=47.1k IOPS][eta 00m:51s]
fio: terminating on signal 2

fio: (groupid=0, jobs=1): err= 0: pid=12549: Thu May  6 03:59:45 2021
  write: IOPS=43.0k, BW=172MiB/s (180MB/s)(1631MiB/9490msec); 0 zone resets
    slat (usec): min=3, max=9530, avg= 9.72, stdev=25.75
    clat (usec): min=67, max=17537, avg=713.43, stdev=507.71
     lat (usec): min=72, max=17543, avg=724.13, stdev=509.03
    clat percentiles (usec):
     |  1.00th=[  249],  5.00th=[  310], 10.00th=[  347], 20.00th=[  424],
     | 30.00th=[  494], 40.00th=[  553], 50.00th=[  611], 60.00th=[  693],
     | 70.00th=[  799], 80.00th=[  930], 90.00th=[ 1139], 95.00th=[ 1352],
     | 99.00th=[ 2024], 99.50th=[ 2540], 99.90th=[ 7898], 99.95th=[10159],
     | 99.99th=[13173]
 
Hi,

@spirit
yes to test the really performance of an underlaying storage you have to make sync=1 otherwise he will use the memory as cache. This will not represent the really performance of the storage. I'm not expecting 4k-IOPS, but all of our vms are slow.

An Windows-Update took 3 hours to complete, linux apt-get upgrade took me yesterday one hour with just 49 packages to update. Thats not normal.


@Craig St George
interesting. i had some ssds with high latency some time ago. after i replaced them it went back to normal.
will check that and come back