Change crush rule in active pool

pushakk79

New Member
Jul 2, 2023
3
0
1
Hello everyone,

we have a proxmox cluster with ceph storage. The ceph cluster has 3 * 4T disk * 3 cluster nodes in a pool (vps) with default replicated_rule. This pool has 6.3T stored data (18T used).

Now we have put 1 * nvme disk in each cluster node. My plan was to create an ssd exclusive crush rule and change the vps pool to use this rule before adding the nvme osd disks so I could avoid the remapping to nvme and back.

Before add new nvme osd I have created new crush rule to use exclusively ssd disks:

$ sudo ceph osd crush rule create-replicated ssd-class default host ssd

But when I have changed the vps pool to use this new rule, the cluster state has changed to warning and it has show some Warnings about space used beside all remappings needed to health recovery. Some log examples:

2023-07-02T06:50:31.864956+0200 mgr.pve-hidra1 (mgr.23242501) 293689 : cluster [DBG] pgmap v292940: 129 pgs: 5 remapped+peering, 13 peering, 111 active+clean; 6.2 TiB data, 18 TiB used, 13 TiB / 31 TiB avail; 57 KiB/s rd, 9.1 MiB/s wr, 373 op/s

2023-07-02T06:50:32.699564+0200 mon.pve-hidra1 (mon.0) 176078 : cluster [DBG] osdmap e2985: 9 total, 9 up, 9 in

2023-07-02T06:50:33.916475+0200 mon.pve-hidra1 (mon.0) 176079 : cluster [WRN] Health check failed: Low space hindering backfill (add storage if this doesn't resolve itself): 1 pg backfill_toofull (PG_BACKFILL_FULL)

............

2023-07-02T06:51:33.889230+0200 mgr.pve-hidra1 (mgr.23242501) 293726 : cluster [DBG] pgmap v292972: 129 pgs: 2 active+remapped+backfill_wait+backfill_toofull, 104 active+remapped+backfilling, 8 active+remapped+backfill_toofull, 15 active+clean; 6.2 TiB data, 18 TiB used, 13 TiB / 31 TiB avail; 107 KiB/s rd, 1.5 MiB/s wr, 72 op/s; 2604479/4916760 objects misplaced (52.971%); 891 MiB/s, 229 objects/s recovering

So I changed the crush rule to default replication_rule again and it is healthy again now after some remapping.

My questions are, If I only changed the crush rule in the pool from replicated_rule (only ssd actually) to ssd-class rule (same ssd disks):

Why the cluster need to remapping if I'm using the same osds?

Why this Low space hindering backfill and backfill_toofull status warnings?

Is it secure to leave the recovery proccess to finish despite off this "low space" and "toofull" warnings? (It was a lot of "yellow" in the proxmox ceph status)

Is it better to create new pool using the ssd-class rule and migrate one by one each vps?

Would be any problem if I leave the vps pool using both ssd and nvme disk and I configure a new pool with a rule using exclusively nvme disks?

Thank you very much.
 
Finally, how can you change the replicated rule? I'm facing the same problem and still consider to create new pool with new replicated rule or change the active pool to new replicated rule
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!