Ceph OSD disk replacement: Prevent multiple re-balancing of data

7thSon · Jun 25, 2021

Hi

We are running a Ceph cluster with 3 nodes and 2 OSDs per node on SSD drives.
One of the nodes has recently been causing problems and both OSDs in this node sporadically report "log_latency_fn slow operation observed for _txc_committed_kv" in the log and the write performance is also poor in some cases. I can reproduce this by repeatedly creating 100MB files via dd, for example. This is usually relatively fast with 200-300 MB/s, but sometimes I reach only under 10 MB/s and exactly in these moments the mentioned message is logged).

I would now like to replace the two affected OSDs / their SSDs.
What is the best way to avoid multiple re-balancing of the data (due to the changes to the crush-map when adding/removing OSDs)?
I would have imagined the following procedure:

1. set norebalance flag
2. stop OSDs of the faulty discs and set them to out
3. install new discs and create new OSDs
4. deactivate norebalance

Does this make sense or does anyone have a better suggestion?

jasonsansone · Jun 25, 2021

Assuming this is 3X replication (it isn’t explicitly stated):

1. set norebalance and no backfill flags
2. stop OSDs of the faulty discs, set them to out, and then destroy OSD to remove from crushmap.
3. install new discs and create new OSDs
4. deactivate Global OSD flags, begin rebalance

7thSon · Jun 25, 2021

Great, thank you!
And yes, its a 3X replication

nttec · Feb 18, 2022

I'm in a similar issue, any issues with this method.

7thSon · Feb 18, 2022

The method worked exactly as described without any problems.

Search

Search

Ceph OSD disk replacement: Prevent multiple re-balancing of data

7thSon

Active Member

jasonsansone

Well-Known Member

7thSon

Active Member

nttec

Renowned Member

7thSon

Active Member

We value your privacy