[SOLVED] Ceph: osd_max_backfills being overridden after changing it.

Aug 2, 2022
10
2
8
Hi,
I've added a new node to my cluster, but the backfilling to the new node is killing the cluster.
osd_max_backfills is set to 1000, so no wonder it kills it. I've tried setting osd_max_backfills on all nodes with:
Code:
ceph tell 'osd.*' injectargs --osd-max-backfills=1
But it doesn't change anything even if it seems to have succeeded.

I then tried changing it for one specific OSD.
Code:
# ceph daemon osd.20 config set osd_max_backfills 1
{
    "success": "osd_max_backfills = '1' "
}
But if I then fetch the value afterwards:
Code:
# ceph daemon osd.20 config get osd_max_backfills
{
    "osd_max_backfills": "1000"
}

I then tried setting it in ceph.conf and restarting the OSD. Right after the OSD came up the value was correct, but the next time is fetched the value, it was back at 1000.

Code:
root@phcpve03:~# ceph daemon osd.67 config get osd_max_backfills
{
    "osd_max_backfills": "1"
}
root@phcpve03:~# ceph daemon osd.67 config get osd_max_backfills
{
    "osd_max_backfills": "1000"
}

I vaguely remember messing with the parameter while setting up the cluster, but haven't been able to find anything in the config files.

Cheers,
René
 
Hi Wolfgang.

You are correct. Proxmox support sorted it out for me.

With 17.2 release the old parameters no longer works, as the default scheduler has been changed.

The weight parameters didn't do it for me, as it maxed out the 10Gbps link between the nodes.

Instead i used a combination of:
osd_mclock_scheduler_background_recovery_res=1
osd_mclock_scheduler_background_recovery_lim=100

The limit can be adjusted, but in our enviroment, that resulted in ~700MB/s recovery.

Cheers,
René
 
Hi René,

i can confirm that the weight parameters don't seem to have any effect.
I tried your parameters and values. They worked very well -> Thanks!

Cheers,
Wolfgang
 
Instead i used a combination of:
osd_mclock_scheduler_background_recovery_res=1
osd_mclock_scheduler_background_recovery_lim=100
How can you put osd_mclock_scheduler_background_recovery_lim to 100? In doc it says max value to osd_mclock_scheduler_background_recovery_lim is 1 and I have a error if I try it.

Code:
#ceph tell 'osd.0' injectargs --osd_mclock_scheduler_background_recovery_lim=100.0
Error EINVAL: Parse error setting osd_mclock_scheduler_background_recovery_lim to '100.0' using injectargs (Value '100.000000' exceeds maximum 1.000000).

https://docs.ceph.com/en/quincy/rados/configuration/osd-config-ref/
osd_mclock_scheduler_background_recovery_lim
IO limit for background recovery over reservation.
type float
default 0.0
allowed range [0, 1.0]