We are using Ceph on three nodes (10g). There are one HDD pool (3 OSDs per node) and one NVMe pool. The NVMes are used for the HDD WALs and DBs, too. For each OSD
During the deep scrubbing phases (I have limited this to some hours during the night) the cluster is terribly slow. I do not have benchmark results but even booting a VM takes forever. I am trying to better understand how deep-scrubbing works so that I may be able to improve the settings.
In my understanding a deep scrub causes the PG to be read on all nodes and is "started" on the main OSD for that PG. I do not know whether the "client scrubbing" of the replica PGs is counted against the OSD scrub maximum and shown in the
I wonder if it can happen that on the same OSD three deep-scrubbing operations can be active at the same time (which would be terrible for the performance): One having been initiated on this OSD and two initiated on other OSDs which have a PG replica on this OSD. In that case it might improve the performance a lot to set the
Is my understanding correct?
osd_max_scrubs
is set to 1.During the deep scrubbing phases (I have limited this to some hours during the night) the cluster is terribly slow. I do not have benchmark results but even booting a VM takes forever. I am trying to better understand how deep-scrubbing works so that I may be able to improve the settings.
In my understanding a deep scrub causes the PG to be read on all nodes and is "started" on the main OSD for that PG. I do not know whether the "client scrubbing" of the replica PGs is counted against the OSD scrub maximum and shown in the
ceph -s
numbers (I guess not as I sometimes see just one PG being scrubbed which seems not to make sense).I wonder if it can happen that on the same OSD three deep-scrubbing operations can be active at the same time (which would be terrible for the performance): One having been initiated on this OSD and two initiated on other OSDs which have a PG replica on this OSD. In that case it might improve the performance a lot to set the
nodeepscrub
flags on all OSDs of two nodes (per scrubbing period).Is my understanding correct?