CEPH Expected Scrubbing, Remap and Backfilling Speeds Post Node Shutdown/Restart

Askey307 · Aug 4, 2025

Good Morning

While we were doing upgrades to our cluster (upgraded each memory from 256 to 512 - 3 identical nodes), doing one node at a time and all VM's removed from HA and switched off, we noticed that after a node comes online it takes approximately 20-30 minutes for the Remap/Scrub/Clean process to finish (thus adding about 1.5 hours additional to the maintenance window) with speeds not reaching more than 160MiBS. Everything goes well. We're just concerned about the speeds and wondered what the expected results should be?

Cluster breakdown:
PVE version 8.4.1
CEPH version 19.2.1
2x LACP Bond (routed setup with fall back per WIKI)
Licensed Enterprise Repos for each node.
4 OSD's Per Node (12 Total Healthy)
NvME Drives for CEPH
25Gbps Meshed Network via DAC connections. (No switch between nodes) direct connections. Ports are 25Gbpsports. Confirmed 25Gbps connectivity.
Dual Socket Nodes.

SteveITS · Aug 5, 2025

You can set maintenance modes for Ceph e.g. noscrub or noout for during the reboot. Just uncheck them after.

Askey307 · Aug 5, 2025

SteveITS said:
You can set maintenance modes for Ceph e.g. noscrub or noout for during the reboot. Just uncheck them after.

Thank you for this. Appreciate it. Did overlook this completely.

complexplaster27 · Aug 5, 2025

Askey307 said:
Thank you for this. Appreciate it. Did overlook this completely.

I also tick nodeep-scrub in addition to the other two prior to rebooting my nodes.

SteveITS · Aug 5, 2025

@Askey307 Now that I'm where I can see our notes, we:

set Ceph flags:
noout
norebalance
norecover
noscrub
nodeep-scrub

ha-manager crm-command node-maintenance enable nodename
(this will migrate, shut down, etc. VMs depending on your settings)

install updates via web GUI

reboot if needed

ha-manager crm-command node-maintenance disable nodename
(this will return the VMs to this node)
uncheck "norecover" flag so Ceph can recover (takes a couple minutes)

repeat for other nodes

uncheck Ceph flags

> 4 Monitors Per Node (12 Total Healthy)

4 OSD per node?

Do note with 3 nodes, if using the default 3/2 replication in Ceph, this means that any OSD outage on other nodes during the node reboot will result in less than 2 copies of the data, and pause I/O.

Askey307 · Aug 5, 2025

SteveITS said:
@Askey307 Now that I'm where I can see our notes, we:

set Ceph flags:
noout
norebalance
norecover
noscrub
nodeep-scrub

ha-manager crm-command node-maintenance enable nodename
(this will migrate, shut down, etc. VMs depending on your settings)

install updates via web GUI

reboot if needed

ha-manager crm-command node-maintenance disable nodename
(this will return the VMs to this node)
uncheck "norecover" flag so Ceph can recover (takes a couple minutes)

repeat for other nodes

uncheck Ceph flags

> 4 Monitors Per Node (12 Total Healthy)

4 OSD per node?

Do note with 3 nodes, if using the default 3/2 replication in Ceph, this means that any OSD outage on other nodes during the node reboot will result in less than 2 copies of the data, and pause I/O.

Saw the typo now. Thank you. Corrected original post. Thanx for the maintenance breakdown.

Search

Search

CEPH Expected Scrubbing, Remap and Backfilling Speeds Post Node Shutdown/Restart

Askey307

New Member

SteveITS

Active Member

Askey307

New Member

complexplaster27

Member

SteveITS

Active Member

Askey307

New Member

We value your privacy