High latency on proxmox ceph cluster

freaknils

Member
Oct 6, 2021
2
1
23
46
Hi there,

we are running an Proxmoy PVE Ceph cluster. Current configuration is:
  • 3 Nodes
  • Each 2 x 10GBit/s LACP to 4 switches (only for ceph)
  • 2 x Intel Xeon E5-2640 2,6GHzs
  • 192GB RAM
  • Each 5 Crucial SATA SSD CT2000MX500SSD1 on HBA
But currently we have high apply latency and low IOPs:

Code:
~# ceph osd perf
osd  commit_latency(ms)  apply_latency(ms)
  5                   3                  3
 27                  99                 99
  1                  85                 85
 30                  85                 85
 25                   4                  4
  2                 114                114
 31                 100                100
 26                   4                  4
 28                   4                  4
  4                   3                  3
 33                 124                124
  0                  83                 83
 29                  81                 81
  3                   3                  3
 32                   4                  4

Code:
~#ceph config dump  
WHO     MASK  LEVEL     OPTION                                 VALUE         RO
mon           advanced  auth_allow_insecure_global_id_reclaim  false          
mgr           advanced  mgr/prometheus/rbd_stats_pools         *             *
mgr           advanced  mgr/prometheus/server_addr             0.0.0.0       *
mgr           advanced  mgr/prometheus/server_port             9283            
osd.0         basic     osd_mclock_max_capacity_iops_ssd       14103.653049    
osd.1         basic     osd_mclock_max_capacity_iops_ssd       29488.059277    
osd.2         basic     osd_mclock_max_capacity_iops_ssd       28081.425876    
osd.25        basic     osd_mclock_max_capacity_iops_ssd       33508.744911    
osd.26        basic     osd_mclock_max_capacity_iops_ssd       29059.514671    
osd.27        basic     osd_mclock_max_capacity_iops_ssd       28531.477981    
osd.28        basic     osd_mclock_max_capacity_iops_ssd       27336.937265    
osd.29        basic     osd_mclock_max_capacity_iops_ssd       28923.222275    
osd.3         basic     osd_mclock_max_capacity_iops_ssd       28222.454163    
osd.30        basic     osd_mclock_max_capacity_iops_ssd       24904.106530    
osd.31        basic     osd_mclock_max_capacity_iops_ssd       33027.761220    
osd.32        basic     osd_mclock_max_capacity_iops_ssd       30125.968423    
osd.33        basic     osd_mclock_max_capacity_iops_ssd       29039.451886    
osd.4         basic     osd_mclock_max_capacity_iops_ssd       13437.845129    
osd.5         basic     osd_mclock_max_capacity_iops_ssd       29437.110346

In the network I can't see anything wrong. Max utiilisation about 2 Gbit/s.

Maybe there is somethin with the blocksize wrong?
Code:
~ # ceph tell osd.1 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 4.4707894909999997,
    "bytes_per_sec": 240168280.38124242,
    "iops": 57.260580153761488
}

~ #  ceph tell osd.1 bench 12288000 4096
{
    "bytes_written": 12288000,
    "blocksize": 4096,
    "elapsed_sec": 0.22974307799999999,
    "bytes_per_sec": 53485833.423020475,
    "iops": 13058.064800542108
}

But rados bench seems to work fine:
Code:
 ~ # rados bench -p vmdata-test 60 write --object_size=16MB
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 16777216 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_dekrzfpve172.m350d4.local_2947169
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        64        48   191.966       192    0.159245    0.226872
    2      16       130       114   227.964       264    0.129076    0.263465
    3      16       188       172   229.296       232    0.190153    0.268754
    4      16       244       228   227.964       224    0.149965    0.269745
    5      16       313       297   237.563       276    0.153686      0.2619
    6      16       408       392   261.292       380    0.147233    0.242496
    7      16       452       436   249.103       176    0.161968    0.239273
    8      16       521       505   252.459       276    0.131404    0.246513
    9      16       609       593   263.514       352    0.158967    0.239661
   10      16       675       659   263.559       264     0.15734    0.240216
   11      16       740       724   263.232       260    0.186306    0.240057
   12      16       816       800   266.626       304    0.190017    0.237839
   13      16       908       892    274.42       368    0.145102    0.232077
   14      16       993       977   279.101       340    0.137637    0.227651
   15      16      1060      1044   278.359       268    0.180164    0.228952
   16      16      1142      1126   281.458       328    0.187289    0.225321
   17      16      1223      1207   283.958       324    0.142261    0.224037
   18      16      1300      1284   285.292       308    0.164915    0.223076
   19      16      1384      1368   287.958       336     0.27237    0.221085
2026-02-24T08:08:19.821984+0100 min lat: 0.0867595 max lat: 0.896916 avg lat: 0.219436
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   20      16      1468      1452   290.359       336    0.218903    0.219436
   21      16      1552      1536   292.529       336    0.279236    0.217773
   22      16      1603      1587   288.504       204    0.134028    0.221397
   23      16      1684      1668   290.046       324    0.272487    0.219583
   24      16      1764      1748   291.292       320    0.178104    0.217464
   25      16      1857      1841   294.517       372    0.193856    0.216869
   26      16      1945      1929   296.726       352    0.109291    0.215212
   27      16      2036      2020   299.215       364    0.165189    0.213234
   28      16      2118      2102   300.242       328    0.251923    0.212245
   29      16      2213      2197    302.99       380    0.143125    0.210969
   30      16      2244      2228   297.023       124    0.211594    0.210369
   31      16      2324      2308   297.763       320    0.141827    0.214227
   32      16      2416      2400   299.956       368    0.137237    0.212851
   33      16      2492      2476   300.077       304    0.234285    0.212307
   34      16      2575      2559   301.015       332    0.123825     0.21169
   35      16      2662      2646   302.356       348    0.223174    0.211221
   36      16      2744      2728   303.068       328    0.215484     0.20999
   37      16      2806      2790   301.579       248    0.136269    0.209555
   38      16      2894      2878   302.904       352    0.234224    0.210683
   39      16      2970      2954   302.931       304    0.149764    0.209496
2026-02-24T08:08:39.824871+0100 min lat: 0.0712509 max lat: 0.911975 avg lat: 0.209701
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   40      16      3062      3046   304.556       368    0.131103    0.209701
   41      16      3141      3125   304.834       316    0.163803    0.209185
   42      16      3223      3207   305.384       328     0.14628    0.208901
   43      16      3308      3292   306.188       340    0.156289    0.208613
   44      16      3396      3380   307.228       352    0.170044    0.207929
   45      16      3481      3465   307.955       340    0.168001     0.20722
   46      16      3554      3538   307.607       292    0.152485     0.20772
   47      16      3628      3612   307.359       296    0.156532    0.208084
   48      16      3704      3688   307.288       304    0.145656    0.207882
   49      16      3780      3764    307.22       304    0.179062    0.208009
   50      16      3862      3846   307.635       328     0.14893    0.207506
   51      16      3953      3937   308.739       364    0.228471    0.206915
   52      16      4033      4017   308.955       320    0.206297    0.206616
   53      16      4102      4086   308.333       276    0.171784    0.206904
   54      16      4197      4181   309.659       380    0.131003    0.206404
   55      16      4258      4242   308.464       244   0.0744125    0.206071
   56      16      4344      4328   309.098       344    0.106985    0.206624
   57      16      4436      4420    310.13       368    0.175133    0.206044
   58      16      4520      4504   310.576       336    0.167728    0.205802
   59      16      4584      4568    309.65       256    0.150262    0.206083
2026-02-24T08:08:59.827798+0100 min lat: 0.058507 max lat: 0.911975 avg lat: 0.206958
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   60      16      4653      4637   309.089       276    0.225592    0.206958
Total time run:         60.1962
Total writes made:      4653
Write size:             4194304
Object size:            16777216
Bandwidth (MB/sec):     309.189
Stddev Bandwidth:       53.9634
Max bandwidth (MB/sec): 380
Min bandwidth (MB/sec): 124
Average IOPS:           77
Stddev IOPS:            13.4909
Max IOPS:               95
Min IOPS:               31
Average Latency(s):     0.206948
Stddev Latency(s):      0.110647
Max latency(s):         0.911975
Min latency(s):         0.058507
Cleaning up (deleting benchmark objects)
Removed 1164 objects
Clean up completed and total clean up time :6.84856
rados bench -p vmdata-test 60 write --object_size=16MB  2,96s user 6,57s system 13% cpu 1:08,13 total

I have tried everything I know, but without any success.

Can you please help me, maybe there is a simple solution?

Thanks ans best wisches!
 

Attachments

  • Bildschirmfoto_20260224_074444.png
    Bildschirmfoto_20260224_074444.png
    130 KB · Views: 6
Use consumer SSDs at your peril. They fail and fail spectacularly.

Only real solution is enterprise SSDs with PLP (power-loss protection) and for endurance.
 
  • Like
Reactions: Johannes S
Try to turn off write cache.
Thanks, that helped for me right now.

I just learned that the SSDs are not for enterprise use. Since we just have got a new bunch of servers with enterprise SSDs and more NICs, I will migrate to the new ones ASAP.

Thanks for your reply and help!
 
  • Like
Reactions: gurubert