[SOLVED] Ceph crush_rules, device_health_metrics pool

Urbaman · Mar 30, 2022

Hi,

On my 3-node cluster I set up ceph using a custom device class (sas900 to identify my sas 900GB devices and put them all in one single pool), waiting for new pools to be created when new devices with different classes will be added to the nodes. I created a custom crush rule (replicated_sas900), associated the pool to the rule and renamed the pool. Everything went smoothly.
I will create new dedicated crush rules as new device type/dimension will arrive to the nodes.
The device_health_metrics pool is still on the default replicated_rule crush rule.

Now, I am trying to figure out some behiviours:
1) if I set the sas900 pool with autoscaler "on", it seems to never finish to recalculate the pgs, creating a high load on the storage
2) I cannot remove the default replicated_rule crush rule, being used by the device_health_metrics pool. I'd like to only have dedicated crush rules.

So: is it normal for the autoscaler continuing to work without end? Will it find a stable pg number? And, can I (should I) change the crush rule for the device_health_metrics pool? To wich one of the three or four dedicated crush rules?

Urbaman · Mar 30, 2022

Ok, auto-solved: I only had to wait for autoscaler to adapt to the new pg number. I expeceted it went directly to the target pg number, but it instead went slowly down to that number.

aaron · Mar 31, 2022

As you saw, the autoscaler will slowly change the number of PGs to not cause a large rebalance, which could have an impact on performance.

If I understand the situation correctly, you only have one kinde of device class in use now. In that situation, it really doesn't matter which rule the device_health_metrics pool gets.

But once you have more device classes in use, each pool needs to be assigned to one. If the device_health_metrics pool will still have the default "replicated_rule" assigned, the autoscaler won't be able to determine the pg_num. This is because at least one pool would span multiple device classes. Without a clear distinction which pools will share a device class, it is impossible for the autoscaler to come to a result.

Therefore you should assign a device class specific rule to all pools. For the device_health_metrics it shouldn't matter which device class you assign it to.

Search

Search

[SOLVED] Ceph crush_rules, device_health_metrics pool

Urbaman

Member

Urbaman

Member

aaron

Proxmox Staff Member