ceph recovery slow and won't change number of backfills

ewaldo · Feb 25, 2024

I have a 3 node cluster and I replaced 3x8TB drives and the recovery/rebalance is going to take 20+ days and maxes out at 10MiB which seems exceptionally slow. I've changed all relevant parameters, though setting them at a global level doesn't have any effect. It does change when set at the OSD level. It still won't perform more than 3 backfills at a time. It is on a 100GBe network so the network speed is definitely not a limit.

root@host3:~# ceph-conf --show-config | egrep "osd_recovery_max_active|osd_recovery_op_priority|osd_max_backfills|osd_mclock_override_recovery_settings"

osd_max_backfills = 1
osd_mclock_override_recovery_settings = false
osd_recovery_max_active = 0
osd_recovery_max_active_hdd = 3
osd_recovery_max_active_ssd = 10
osd_recovery_op_priority = 3

All OSDs are set the same
root@host3:~# ceph daemon osd.5 config get osd_recovery_max_active
{
"osd_recovery_max_active": "10"
}
root@host3:~# ceph daemon osd.5 config get osd_recovery_op_priority
{
"osd_recovery_op_priority": "3"
}
root@host3:~# ceph daemon osd.5 config get osd_max_backfills
{
"osd_max_backfills": "10"
}
root@host3:~# ceph daemon osd.5 config get osd_mclock_override_recovery_settings
{
"osd_mclock_override_recovery_settings": "true"
}
root@host3:~# ceph daemon osd.5 config get osd_recovery_max_active_hdd
{
"osd_recovery_max_active_hdd": "10"
}

ewaldo · Feb 25, 2024

I should mention the WAL is on an NVME

flotho · Jul 9, 2025

any solution on this one ?

ceph recovery slow and won't change number of backfills

ewaldo

Member

ewaldo

Member

flotho

Renowned Member

We value your privacy