Ceph Rebalance Speed Sanity Check

Thor192 · Apr 20, 2024

I know there's a lot of people talking about this stuff, I just want to understand better so anyone that reads this thank you.

I have a 8 node cluster with 16 OSD all 10g network, connected to the same switch, MTU 9000. I also disabled scrub.

I added 4 disks to the cluster is what stared all this.

Is the speed of of anywhere between 20MiB/s to 30MiB/s feel right?

All HDD are HGST Ultrastar He8 8TB 512e 7200RPM SATA 6Gb/s 3.5

I have run this command, no help.

ceph tell 'osd.*' injectargs --osd-max-backfills=3 --osd-recovery-max-active=9

Here is the output of ceph -s

Code:

  cluster:
    id:     d8fe0a66-21dd-45a1-93cd-8b7de0ded4fb
    health: HEALTH_WARN
            Module 'dashboard' has failed dependency: PyO3 modules may only be initialized once per interpreter process
            noscrub flag(s) set
 
  services:
    mon: 8 daemons, quorum prox45,prox46,prox47,prox48,prox70,prox71,prox72,prox73 (age 74m)
    mgr: prox48(active, since 67m), standbys: prox45, prox46, prox47, prox70, prox71, prox72, prox73
    osd: 16 osds: 16 up (since 74m), 16 in (since 74m); 90 remapped pgs
         flags noscrub
 
  data:
    pools:   2 pools, 129 pgs
    objects: 2.67M objects, 10 TiB
    usage:   31 TiB used, 86 TiB / 116 TiB avail
    pgs:     2365225/7999611 objects misplaced (29.567%)
             88 active+remapped+backfill_wait
             39 active+clean
             2  active+remapped+backfilling
 
  io:
    client:   117 KiB/s rd, 979 KiB/s wr, 9 op/s rd, 19 op/s wr
    recovery: 25 MiB/s, 6 objects/s

MY MAP

Code:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host prox45 {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    # weight 21.83217
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 7.27739
    item osd.1 weight 7.27739
    item osd.2 weight 7.27739
}
host prox46 {
    id -5        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    # weight 21.83217
    alg straw2
    hash 0    # rjenkins1
    item osd.3 weight 7.27739
    item osd.4 weight 7.27739
    item osd.11 weight 7.27739
}
host prox47 {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    # weight 21.83217
    alg straw2
    hash 0    # rjenkins1
    item osd.5 weight 7.27739
    item osd.6 weight 7.27739
    item osd.10 weight 7.27739
}
host prox48 {
    id -9        # do not change unnecessarily
    id -10 class hdd        # do not change unnecessarily
    # weight 21.83217
    alg straw2
    hash 0    # rjenkins1
    item osd.7 weight 7.27739
    item osd.8 weight 7.27739
    item osd.9 weight 7.27739
}
host prox70 {
    id -11        # do not change unnecessarily
    id -12 class hdd        # do not change unnecessarily
    # weight 7.27739
    alg straw2
    hash 0    # rjenkins1
    item osd.12 weight 7.27739
}
host prox71 {
    id -13        # do not change unnecessarily
    id -14 class hdd        # do not change unnecessarily
    # weight 7.27739
    alg straw2
    hash 0    # rjenkins1
    item osd.13 weight 7.27739
}
host prox72 {
    id -15        # do not change unnecessarily
    id -16 class hdd        # do not change unnecessarily
    # weight 7.27739
    alg straw2
    hash 0    # rjenkins1
    item osd.14 weight 7.27739
}
host prox73 {
    id -17        # do not change unnecessarily
    id -18 class hdd        # do not change unnecessarily
    # weight 7.27739
    alg straw2
    hash 0    # rjenkins1
    item osd.15 weight 7.27739
}
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    # weight 116.43823
    alg straw2
    hash 0    # rjenkins1
    item prox45 weight 21.83217
    item prox46 weight 21.83217
    item prox47 weight 21.83217
    item prox48 weight 21.83217
    item prox70 weight 7.27739
    item prox71 weight 7.27739
    item prox72 weight 7.27739
    item prox73 weight 7.27739
}

# rules
rule replicated_rule {
    id 0
    type replicated
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

gurubert · Apr 21, 2024

Yes, looks fairly reasonable as the RocksDB on HDD will drastically reduce the performance.

If you want decent speed the RocksDB and WAL of the OSD should reside on SSD devices.

Thor192 · Apr 22, 2024

Thank you so much

Thor192 · Apr 22, 2024

Maybe I should just Google this but if I did end up with some ssds how would I go about incorporating those in?

Search

Search

Ceph Rebalance Speed Sanity Check

Thor192

New Member

ceph tell 'osd.*' injectargs --osd-max-backfills=3 --osd-recovery-max-active=9

Here is the output of ceph -s

gurubert

Famous Member

Thor192

New Member

Thor192

New Member

Ceph Rebalance Speed Sanity Check

Thor192

New Member

ceph tell 'osd.*' injectargs --osd-max-backfills=3 --osd-recovery-max-active=9​

​

Here is the output of ceph -s​

gurubert

Famous Member

Thor192

New Member

Thor192

New Member

ceph tell 'osd.*' injectargs --osd-max-backfills=3 --osd-recovery-max-active=9

Here is the output of ceph -s