Ceph placement group remapping

Straightman

New Member
Feb 15, 2025
14
2
3
I have a 3 node ceph cluster setup containing no data just now. I have two RBD pools, one is a replicated pool dedicated to only SSD's and the other is an erasure coded pool containing only spinning disk hardware. The erasure coded pool is setup to use the replicated pool for its meta data. Ceph status overall is Healthy-OK with 62 PG's that indicate Active+Clean (which I understand to be a healthy status for a PG) however 3 PG's report a status of Active+Clean+Remapped. All 13 OSD's are in and up.
I understood from what I have read that remapping can occur from time to time however to expect it to resolve to Active+Clean. I have left this for several days now and for a cluster containing no data I would have expected that remapped status to change by now. Looking for guidance as to how to troubleshoot and correct whatever may be causing the +remapped status from resolving.
 
Sorry, I have no idea. But I have a generic link with some hints: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-pg/
Thanks for the guidance I have read this several other leads that have informed me a lot however have not lead to a resolutions.
I tried several things including using the repair pg command which did not resolve the issue. I located the osd's responsible for the PG's flagged as +remapped and took those out and down and then brought them back into up/in state and this did not change the outcome. The whole time ceph is reporting a healthy cluster. In this scenario I started with a single replicated rbd pool using a crush map which writes exclusively to ssd drives. In this state ceph reported all pgs as active+clean. I then added an erasure coded pool K=2/m=1 with a EC profile to use hdd drives. It was at that point the ceph cluster reported 3 PG's with a +remapped status as I reported in the post above. This condition has been persistent and I could not resolve the PG status comprehensively to active+clean.
I decided to remove the EC pool completely and its profile. I had additional HDD which I added to one of the nodes that had a lower count of hdd than the other two just to create a different scenario. I brought that new drive in as an osd and created a new erasure coded pool and profile. Ceph now reports that the cluster is Healthy however the PG's status is 64 pg's active+clean and 1 pg active+clean+remapped. So a similar result overall and I cannot seem to impact that.
If anyone can help I would appreciate it, as is, the ceph cluster has no data and unless I can build confidence to be able to understand and resolve this issue I am hesitant to use it for anything more. Let me know if I can provide any additional info to help troubleshoot.
 
Last edited:
The screenshot is from ceph osd pool ls, (I could not get, ceph osd pool ls dump to run without error, let me k now if I need to tweak that?)
1743030035777.png

1743030192608.png
 

Attachments

  • 1743031038590.png
    1743031038590.png
    105.1 KB · Views: 4
This may help also, it is the PG dump showing the PG which has the odd status. The list is not complete, it is constrained to show only those PG's associated to pool #8 (the EC pool).
Here also is the output of the PG's allocated to the Ceph_RBD_EC_Pool (Pool #8)
View attachment 84180
 
your config is unworkable.

While you didnt provide your actual crush rules, I can already see they can never be satisfied.

Consider: you have 3 nodes.
node pve2 15.25TB HDD, 1.83TB SSD
node pve3 7.27TB HDD, 0.7TB SSD
node pve4 0.5 HDD, 0.9 SSD

For your replication profile, even assuming you do NOT have a device class defined, you can only have about 1.4TB usable AT MAX. If you DO have a device class in your crush rule its even worse.

For you EC profile you probably can only have 2.8TB on a 2k1n crush rule, but such a rule is almost useless because your pool will go read only if any node is down.

Honestly there's no point in "troubleshooting" this config. you need to go back to the documentation and rethink your layout.
 
  • Like
Reactions: fba and gurubert
your config is unworkable.

While you didnt provide your actual crush rules, I can already see they can never be satisfied.

Consider: you have 3 nodes.
node pve2 15.25TB HDD, 1.83TB SSD
node pve3 7.27TB HDD, 0.7TB SSD
node pve4 0.5 HDD, 0.9 SSD

For your replication profile, even assuming you do NOT have a device class defined, you can only have about 1.4TB usable AT MAX. If you DO have a device class in your crush rule its even worse.

For you EC profile you probably can only have 2.8TB on a 2k1n crush rule, but such a rule is almost useless because your pool will go read only if any node is down.

Honestly there's no point in "troubleshooting" this config. you need to go back to the documentation and rethink your layout.
Thank you for the guidance and the time taken to consider my post. I will start to dig into the documentation thoroughly to learn more. What is still unclear is why one placement group cannot resolve to active +clean in this scenario where the storage demand is currently zero. Any thoughts appreciated.
 
Thanks for taking the time to review my questions and providing the additional clarity, I will go back to the drawing board, learn some more and rethink the approach.
 
  • Like
Reactions: gurubert