How to slow down CEPH Rebalance

merkkg · Sep 19, 2022

Hi,

I have a 3 node cluster, that is working perfectly for my requirements

Each node is identical with below setup
CPU: Dual Socket Intel Xeon 4214 CPU @ 2.20GHz
Memory: 256GB Ram
Network Card: 2 x 100G ports
Disk Controller: MegaRAID® SAS 9380-8e RAID Controller (JBOD Mode) connected to Supermicro CSE-216BE1C-R741JBOD
Disks: 16 x Samsung MZ7LH3T8HMLT PM883 3.84 TB
but problem is when a host goes down or for some reason rebalancing needs to happen its too fast causing all VM's on cluster to have issues due to the apply/commit latency on OSDs goes above 500

How can I slow down the rebalancing to not affect real usage or prioritize real usage over rebalancing.

The rebalancing is only happening at 4GB/s I would expect it to handle alot more.

Another question is during the rebalancing CPU usage goes very high mainly by the ceph-osd processes at times 90% on each host. Main pool with most data (60T) is a CEPHFS pool with 2048 placement groups, size 2/1

Search

Search

How to slow down CEPH Rebalance

merkkg

Member

We value your privacy