Ceph DB/WAL SSD Configuration Validation

Jul 27, 2022
6
0
1
Hello,

I need some guidance on the configuration for the performance of rbd performance. I've followed instructions and gone through the documentation multiple times but can't get the disk performance as high as I expect them to be.

My current setup is as follows:

3 similarly configured node, with each node having:
  • 6x 8TB SATA HDD (leaving room for additional 6x 8TB drives in the future)
  • 2x 960GB Intel Optane 905p
  • Ceph Network using 10Gbe full-mesh setup (will convert to a 10Gbe switch when the 4th node is to be added) with Jumbo frames enabled
Since I have plans to add additional drives, I have set the DB/WAL sizes to be 148.984375 Gibibytes on the SSDs for each OSD.

I'm not sure if I missed something on the configurations, but I can't get the disks to perform above 200MB/s writes, and on Linux VM's it performs worse. I have writeback enabled on the VM's HDD.

Here's my ceph configuration:

Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.xx.xx.21/24
     fsid = xxxxxxx
     mon_allow_pool_delete = true
     mon_host = 10.xx.xx.21 10.xx.xx.22 10.xx.xx.23
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.xx.xx.21/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.xx-xxx-01]
     host = xx-xxx-01
     mds_standby_for_name = pve

[mds.xx-xxx-02]
     host = xx-xxx-02
     mds_standby_for_name = pve

[mds.xx-xxx-03]
     host = xx-xxx-03
     mds standby for name = pve

[mon.xx-xxx-01]
     public_addr = 10.xx.xx.21

[mon.xx-xxx-02]
     public_addr = 10.xx.xx.22

[mon.xx-xxx-03]
     public_addr = 10.xx.xx.23


Here's the crush map

Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host xx-xxx-01 {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    # weight 44.537
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 7.423
    item osd.1 weight 7.423
    item osd.2 weight 7.423
    item osd.3 weight 7.423
    item osd.4 weight 7.423
    item osd.5 weight 7.423
}
host xx-xxx-02 {
    id -5        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    # weight 44.537
    alg straw2
    hash 0    # rjenkins1
    item osd.6 weight 7.423
    item osd.7 weight 7.423
    item osd.8 weight 7.423
    item osd.12 weight 7.423
    item osd.13 weight 7.423
    item osd.14 weight 7.423
}
host xx-xxx-03 {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    # weight 44.537
    alg straw2
    hash 0    # rjenkins1
    item osd.9 weight 7.423
    item osd.10 weight 7.423
    item osd.11 weight 7.423
    item osd.15 weight 7.423
    item osd.16 weight 7.423
    item osd.17 weight 7.423
}
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    # weight 133.612
    alg straw2
    hash 0    # rjenkins1
    item xx-xxx-01 weight 44.537
    item xx-xxx-02 weight 44.537
    item xx-xxx-03 weight 44.537
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

Ceph osd metadata output (ceph osd metadata |grep "id")

Code:
        "id": 0,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme0n1=INTEL_SSDPE21D960GA_xxxx,sdc=ST8000NM000A-2KE101_xxxx",
        "id": 1,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme0n1=INTEL_SSDPE21D960GA_xxxx,sdd=ST8000NM000A-2KE101_xxxx",
        "id": 2,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme0n1=INTEL_SSDPE21D960GA_xxxx,sde=ST8000NM000A-2KE101_xxxx",
        "id": 3,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme1n1=INTEL_SSDPE21D960GA_xxxx,sdf=ST8000NM000A-2KE101_xxxx",
        "id": 4,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme1n1=INTEL_SSDPE21D960GA_xxxx,sdg=ST8000NM000A-2KE101_xxxx",
        "id": 5,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme1n1=INTEL_SSDPE21D960GA_xxxx,sdh=ST8000NM000A-2KE101_xxxx",
        "id": 6,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme0n1=INTEL_SSDPE21D960GA_xxxx,sdc=ST8000NM000A-2KE101_xxxx",
        "id": 7,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme0n1=INTEL_SSDPE21D960GA_xxxx,sdd=ST8000NM000A-2KE101_xxxx",
        "id": 8,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme0n1=INTEL_SSDPE21D960GA_xxxx,sde=ST8000NM000A-2KE101_xxxx",
        "id": 9,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme0n1=INTEL_SSDPE21D960GA_xxxx,sdc=ST8000NM000A-2KE101_xxxx",
        "id": 10,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme0n1=INTEL_SSDPE21D960GA_xxxx,sdd=ST8000NM000A-2KE101_xxxx",
        "id": 11,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme0n1=INTEL_SSDPE21D960GA_xxxx,sde=ST8000NM000A-2KE101_xxxx",
        "id": 12,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme1n1=INTEL_SSDPE21D960GA_xxxx,sdf=ST8000NM000A-2KE101_xxxx",
        "id": 13,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme1n1=INTEL_SSDPE21D960GA_xxxx,sdg=ST8000NM000A-2KE101_xxxx",
        "id": 14,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme1n1=INTEL_SSDPE21D960GA_xxxx,sdh=ST8000NM000A-2KE101_xxxx",
        "id": 15,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme1n1=INTEL_SSDPE21D960GA_xxxx,sdf=ST8000NM000A-2KE101_xxxx",
        "id": 16,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme1n1=INTEL_SSDPE21D960GA_xxxx,sdg=ST8000NM000A-2KE101_xxxx",
        "id": 17,
        "bluefs_db_block_size": "4096",
        "bluefs_db_size": "159970754560",
        "bluestore_bdev_block_size": "4096",
        "bluestore_bdev_size": "8001545043968",
        "device_ids": "nvme1n1=INTEL_SSDPE21D960GA_xxxx,sdh=ST8000NM000A-2KE101_xxxx",

Any insights on this would be appreciated.

Thank you.
 
Correction on the the ceph metadata outputl, command to product it is :

ceph osd metadata |grep -E "(id|size)"


rados bench -p test 60 write -b 4M -t 16 output:

Code:
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_xx-xxx-01_905661
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        37        21   83.9889        84     0.19964     0.20504
    2      16        40        24   47.9943        12     1.94094     0.34626
    3      16        68        52   69.3258       112     2.48321    0.751476
    4      16        97        81   80.9916       116    0.366607    0.705151
    5      16       139       123     98.39       168     0.74078    0.611355
    6      16       174       158   105.323       140    0.887255    0.562367
    7      16       205       189   107.989       124     0.38199    0.559181
    8      16       225       209    104.49        80    0.643509    0.565878
    9      16       264       248   110.211       156    0.534972    0.556602
   10      16       303       287   114.789       156     0.36318    0.528203
   11      16       335       319   115.989       128     0.47593    0.519724
   12      16       363       347   115.655       112   0.0560862    0.524809
   13      16       399       383   117.834       144   0.0730216    0.529192
   14      16       433       417   119.131       136    0.124672    0.521306
   15      16       456       440   117.322        92    0.225564    0.525463
   16      16       477       461   115.239        84    0.453517    0.531722
   17      16       494       478    112.46        68     0.88527    0.529442
   18      16       510       494   109.767        64    0.386105    0.531471
   19      16       540       524   110.305       120   0.0940041    0.543789
2022-07-27T06:51:14.038949+0000 min lat: 0.0502757 max lat: 4.08902 avg lat: 0.534382
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   20      16       564       548   109.589        96   0.0694755    0.534382
   21      16       585       569    108.37        84    0.621442    0.527679
   22      16       617       601   109.262       128     6.21496    0.549156
   23      16       658       642   111.641       164    0.205618    0.542424
   24      16       689       673   112.156       124   0.0580955    0.549827
   25      16       718       702   112.309       116    0.118794    0.547419
   26      16       752       736    113.22       136    0.128664    0.545346
   27      16       777       761    112.73       100   0.0910793    0.554291
   28      16       799       783   111.846        88     0.21639    0.556789
   29      16       805       789   108.817        24    0.136167     0.55552
   30      16       824       808   107.723        76    0.528511    0.577724
   31      16       841       825   106.441        68   0.0913758    0.584158
   32      16       876       860   107.489       140    0.233297    0.589759
   33      16       905       889   107.747       116   0.0903622    0.586692
   34      16       948       932   109.636       172    0.072689    0.577914
   35      16       982       966   110.389       136    0.636981    0.571422
   36      16      1022      1006   111.766       160    0.637322    0.564038
   37      16      1064      1048   113.286       168    0.267454    0.558712
   38      16      1105      1089    114.62       164   0.0728187    0.551899
   39      16      1135      1119   114.758       120    0.212986    0.548582
2022-07-27T06:51:34.041040+0000 min lat: 0.0502757 max lat: 6.47346 avg lat: 0.54478
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   40      16      1170      1154   115.388       140    0.189646     0.54478
   41      16      1195      1179   115.013       100     1.34291    0.541675
   42      16      1213      1197   113.988        72   0.0666719    0.538399
   43      16      1225      1209   112.454        48     5.64552    0.545195
   44      16      1253      1237   112.443       112    0.151087    0.539901
   45      16      1273      1257   111.722        80    0.491675    0.538507
   46      16      1285      1269   110.337        48    0.160091    0.537734
   47      16      1314      1298   110.457       116    0.101436    0.540119
   48      16      1355      1339   111.572       164   0.0540538    0.537693
   49      16      1384      1368   111.662       116    0.128213    0.532039
   50      16      1420      1404   112.309       144   0.0921608    0.532381
   51      16      1450      1434   112.459       120    0.189336    0.528751
   52      16      1468      1452   111.681        72    0.558567    0.525449
   53      16      1484      1468   110.781        64     2.50514    0.533742
   54      16      1504      1488   110.211        80    0.484445    0.532824
   55      16      1517      1501   109.152        52     1.06387    0.543966
   56      16      1528      1512   107.989        44     15.6189    0.555508
   57      16      1549      1533   107.568        84     1.85888    0.556184
   58      16      1568      1552   107.023        76   0.0611302    0.554405
   59      16      1601      1585   107.446       132      0.8161    0.547601
2022-07-27T06:51:54.043367+0000 min lat: 0.0501154 max lat: 16.716 avg lat: 0.551337
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   60      16      1635      1619   107.922       136    0.104789    0.551337
   61       4      1635      1631   106.939        48     9.68641    0.580969
Total time run:         61.197
Total writes made:      1635
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     106.868
Stddev Bandwidth:       38.9523
Max bandwidth (MB/sec): 172
Min bandwidth (MB/sec): 12
Average IOPS:           26
Stddev IOPS:            9.73807
Max IOPS:               43
Min IOPS:               3
Average Latency(s):     0.592094
Stddev Latency(s):      1.26726
Max latency(s):         16.716
Min latency(s):         0.0501154
Cleaning up (deleting benchmark objects)
Removed 1635 objects
Clean up completed and total clean up time :9.87714
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!