[SOLVED] Broken Ceph - .mgr PG shows "Unknown"

Mar 3, 2025
10
5
3
I'll start at the beginning because I'm not sure where I screwed up.

I had originally set up Ceph and a data pool and all seemed to be working, however I needed to change the CRUSH map to make sure copies couldn't be kept on the same server chassis in case of power supply failure (I have 8 servers, two per chassis with shared power supplies). Here's what I ran to make that happen:

Bash:
root@pves01:~# ceph osd crush add-bucket chasis1 row
added bucket chasis1 type row to crush map
root@pves01:~# ceph osd crush add-bucket chasis2 row
added bucket chasis2 type row to crush map
root@pves01:~# ceph osd crush add-bucket chasis3 row
added bucket chasis3 type row to crush map
root@pves01:~# ceph osd crush add-bucket chasis4 row
added bucket chasis4 type row to crush map
root@pves01:~# ceph osd crush move pves01 row=chasis1
moved item id -3 name 'pves01' to location {row=chasis1} in crush map
root@pves01:~# ceph osd crush move pves02 row=chasis1
moved item id -5 name 'pves02' to location {row=chasis1} in crush map
root@pves01:~# ceph osd crush move pves03 row=chasis2
moved item id -7 name 'pves03' to location {row=chasis2} in crush map
root@pves01:~# ceph osd crush move pves04 row=chasis2
moved item id -9 name 'pves04' to location {row=chasis2} in crush map
root@pves01:~# ceph osd crush move pves05 row=chasis3
moved item id -11 name 'pves05' to location {row=chasis3} in crush map
root@pves01:~# ceph osd crush move pves06 row=chasis3
moved item id -13 name 'pves06' to location {row=chasis3} in crush map
root@pves01:~# ceph osd crush move pves07 row=chasis4
moved item id -15 name 'pves07' to location {row=chasis4} in crush map
root@pves01:~# ceph osd crush move pves08 row=chasis4
moved item id -17 name 'pves08' to location {row=chasis4} in crush map
root@pves01:~# ceph osd crush rule create-replicated chasis_rule default row
root@pves01:~# ceph osd pool set CEPH-Pool crush_rule chasis_rule
set pool 2 crush_rule to chasis_rule
root@pves01:~#

After doing this, I verified that PGs for the data pool (named CEPH-Pool) were spread across the cluster as I expected, and they were. This cluster is still being commissioned so there was only a single test VM on the pool.

From there, I determined that I didn't need three copies of everything, as that was eating a little too much of my available space. I went into the web GUI, Ceph -> Pools, and edited the pool "CEPH-Pool" from the default size of 3 down to 2. That's where things began to break: Under the performance monitoring, Ceph was showing -166.667% for rebalancing, showing something like 29425/17655 objects were misplaced, and the cluster storage was now showing as full.

I dove into the console to try to figure something out, and after trying many, many things to force Ceph to delete the extra copies or rebalance I decided to simply destroy the pool and start over. I couldn't delete it from the web GUI due to it trying to look for RBD images and hanging (due to not being able to access any data on the pool, I'm guessing), so I manually destroyed it with ceph osd pool rm CEPH-Pool CEPH-Pool --yes-i-really-really-mean-it . I deleted it from the cluster storage locations and thought that was that. I created a new pool with a size of 2 and my "chasis_rule" for the CRUSH rule, however all PGs remained in "Unknown" state and never got placed onto OSDs. I deleted that pool, again manually, then noticed I still had errors of "15/9 objects misplaced." I tracked that down to the .mgr pool, which I hadn't touched up to this point. I found this post on how to delete and recreate the .mgr pool and followed it, successfully recreating the .mgr pool however now with 1 PG stuck in "Unknown."

I've tried destroying it and recreating it a few times now, however the PG never leaves "Unknown" status and trying to repair gives me "pg <#> has no primary osd". Any time I try to create a new pool, the PGs never get placed onto OSDs, which makes me think that I somehow screwed up the CRUSH map or somehow the OSD mapping. Can anyone tell me what went wrong and how to fix this?

Here's my CRUSH map:
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd
device 21 osd.21 class hdd
device 22 osd.22 class hdd
device 23 osd.23 class hdd
device 24 osd.24 class hdd
device 25 osd.25 class hdd
device 26 osd.26 class hdd
device 27 osd.27 class hdd
device 28 osd.28 class hdd
device 29 osd.29 class hdd
device 30 osd.30 class hdd
device 31 osd.31 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    # weight 0.00000
    alg straw2
    hash 0    # rjenkins1
}
host pves01 {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 1.81940
    item osd.1 weight 1.81940
    item osd.2 weight 1.81940
    item osd.3 weight 1.81940
}
host pves02 {
    id -5        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.4 weight 1.81940
    item osd.5 weight 1.81940
    item osd.6 weight 1.81940
    item osd.7 weight 1.81940
}
host pves03 {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.8 weight 1.81940
    item osd.9 weight 1.81940
    item osd.10 weight 1.81940
    item osd.11 weight 1.81940
}
host pves04 {
    id -9        # do not change unnecessarily
    id -10 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.12 weight 1.81940
    item osd.13 weight 1.81940
    item osd.14 weight 1.81940
    item osd.15 weight 1.81940
}
host pves05 {
    id -11        # do not change unnecessarily
    id -12 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.16 weight 1.81940
    item osd.17 weight 1.81940
    item osd.18 weight 1.81940
    item osd.19 weight 1.81940
}
host pves06 {
    id -13        # do not change unnecessarily
    id -14 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.20 weight 1.81940
    item osd.21 weight 1.81940
    item osd.22 weight 1.81940
    item osd.23 weight 1.81940
}
host pves07 {
    id -15        # do not change unnecessarily
    id -16 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.24 weight 1.81940
    item osd.25 weight 1.81940
    item osd.26 weight 1.81940
    item osd.27 weight 1.81940
}
host pves08 {
    id -17        # do not change unnecessarily
    id -18 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.28 weight 1.81940
    item osd.29 weight 1.81940
    item osd.30 weight 1.81940
    item osd.31 weight 1.81940
}
row chasis1 {
    id -19        # do not change unnecessarily
    id -26 class hdd        # do not change unnecessarily
    # weight 14.55518
    alg straw2
    hash 0    # rjenkins1
    item pves01 weight 7.27759
    item pves02 weight 7.27759
}
row chasis2 {
    id -20        # do not change unnecessarily
    id -25 class hdd        # do not change unnecessarily
    # weight 14.55518
    alg straw2
    hash 0    # rjenkins1
    item pves03 weight 7.27759
    item pves04 weight 7.27759
}
row chasis3 {
    id -21        # do not change unnecessarily
    id -24 class hdd        # do not change unnecessarily
    # weight 14.55518
    alg straw2
    hash 0    # rjenkins1
    item pves05 weight 7.27759
    item pves06 weight 7.27759
}
row chasis4 {
    id -22        # do not change unnecessarily
    id -23 class hdd        # do not change unnecessarily
    # weight 14.55518
    alg straw2
    hash 0    # rjenkins1
    item pves07 weight 7.27759
    item pves08 weight 7.27759
}

# rules
rule host_rule {
    id 0
    type replicated
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
rule chasis_rule {
    id 1
    type replicated
    step take default
    step chooseleaf firstn 0 type row
    step emit
}

# end crush map

Here's the OSD tree:

Code:
root@pves01:~# ceph osd tree
ID   CLASS  WEIGHT    TYPE NAME        STATUS  REWEIGHT  PRI-AFF
-22         14.55518  row chasis4                               
-15          7.27759      host pves07                           
 24    hdd   1.81940          osd.24       up   0.95001  1.00000
 25    hdd   1.81940          osd.25       up   1.00000  1.00000
 26    hdd   1.81940          osd.26       up   1.00000  1.00000
 27    hdd   1.81940          osd.27       up   1.00000  1.00000
-17          7.27759      host pves08                           
 28    hdd   1.81940          osd.28       up   1.00000  1.00000
 29    hdd   1.81940          osd.29       up   1.00000  1.00000
 30    hdd   1.81940          osd.30       up   1.00000  1.00000
 31    hdd   1.81940          osd.31       up   1.00000  1.00000
-21         14.55518  row chasis3                               
-11          7.27759      host pves05                           
 16    hdd   1.81940          osd.16       up   1.00000  1.00000
 17    hdd   1.81940          osd.17       up   0.95001  1.00000
 18    hdd   1.81940          osd.18       up   1.00000  1.00000
 19    hdd   1.81940          osd.19       up   1.00000  1.00000
-13          7.27759      host pves06                           
 20    hdd   1.81940          osd.20       up   1.00000  1.00000
 21    hdd   1.81940          osd.21       up   1.00000  1.00000
 22    hdd   1.81940          osd.22       up   1.00000  1.00000
 23    hdd   1.81940          osd.23       up   1.00000  1.00000
-20         14.55518  row chasis2                               
 -7          7.27759      host pves03                           
  8    hdd   1.81940          osd.8        up   0.95001  1.00000
  9    hdd   1.81940          osd.9        up   1.00000  1.00000
 10    hdd   1.81940          osd.10       up   1.00000  1.00000
 11    hdd   1.81940          osd.11       up   1.00000  1.00000
 -9          7.27759      host pves04                           
 12    hdd   1.81940          osd.12       up   1.00000  1.00000
 13    hdd   1.81940          osd.13       up   1.00000  1.00000
 14    hdd   1.81940          osd.14       up   1.00000  1.00000
 15    hdd   1.81940          osd.15       up   1.00000  1.00000
-19         14.55518  row chasis1                               
 -3          7.27759      host pves01                           
  0    hdd   1.81940          osd.0        up   1.00000  1.00000
  1    hdd   1.81940          osd.1        up   1.00000  1.00000
  2    hdd   1.81940          osd.2        up   1.00000  1.00000
  3    hdd   1.81940          osd.3        up   1.00000  1.00000
 -5          7.27759      host pves02                           
  4    hdd   1.81940          osd.4        up   1.00000  1.00000
  5    hdd   1.81940          osd.5        up   1.00000  1.00000
  6    hdd   1.81940          osd.6        up   1.00000  1.00000
  7    hdd   1.81940          osd.7        up   0.95001  1.00000
 -1                0  root default                             
root@pves01:~#

And here's the stuck PG:
Code:
PG   OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES  OMAP_BYTES*  OMAP_KEYS*  LOG  LOG_DUPS  STATE    SINCE  VERSION  REPORTED  UP     ACTING  SCRUB_STAMP                      DEEP_SCRUB_STAMP                 LAST_SCRUB_DURATION  SCRUB_SCHEDULING
6.0        0         0          0        0      0            0           0    0         0  unknown     6m      0'0       0:0  []p-1   []p-1  2025-03-03T13:26:18.239533-0600  2025-03-03T13:26:18.239533-0600                    0  --

Any thoughts? Thanks in advance.
 
Alright, figured it out after much rubber-ducky debugging with a co-worker:

My CRUSH map was wrong - in none of the walkthroughs I saw did it mention being sure to add the new buckets I'd created to the default root, so simply running
Bash:
root@pves01:~# ceph osd crush move chasis1 root=default
moved item id -19 name 'chasis1' to location {root=default} in crush map
root@pves01:~# ceph osd crush move chasis2 root=default
moved item id -20 name 'chasis2' to location {root=default} in crush map
root@pves01:~# ceph osd crush move chasis3 root=default
moved item id -21 name 'chasis3' to location {root=default} in crush map
root@pves01:~# ceph osd crush move chasis4 root=default
moved item id -22 name 'chasis4' to location {root=default} in crush map

immediately solved the issue. After running the first command, the stuck PG for .mgr went into an "undersized+peered" state, which upon adding the rest of the buckets into the default root immediately cleared up as it replicated. I created a new data pool and all seems to be working well now.

For others that may find this, here's the complete CRUSH map for reference - I had to look here to find an example CRUSH map, which helped me spot the issue:
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd
device 21 osd.21 class hdd
device 22 osd.22 class hdd
device 23 osd.23 class hdd
device 24 osd.24 class hdd
device 25 osd.25 class hdd
device 26 osd.26 class hdd
device 27 osd.27 class hdd
device 28 osd.28 class hdd
device 29 osd.29 class hdd
device 30 osd.30 class hdd
device 31 osd.31 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host pves01 {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 1.81940
    item osd.1 weight 1.81940
    item osd.2 weight 1.81940
    item osd.3 weight 1.81940
}
host pves02 {
    id -5        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.4 weight 1.81940
    item osd.5 weight 1.81940
    item osd.6 weight 1.81940
    item osd.7 weight 1.81940
}
row chasis1 {
    id -19        # do not change unnecessarily
    id -26 class hdd        # do not change unnecessarily
    # weight 14.55518
    alg straw2
    hash 0    # rjenkins1
    item pves01 weight 7.27759
    item pves02 weight 7.27759
}
host pves03 {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.8 weight 1.81940
    item osd.9 weight 1.81940
    item osd.10 weight 1.81940
    item osd.11 weight 1.81940
}
host pves04 {
    id -9        # do not change unnecessarily
    id -10 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.12 weight 1.81940
    item osd.13 weight 1.81940
    item osd.14 weight 1.81940
    item osd.15 weight 1.81940
}
row chasis2 {
    id -20        # do not change unnecessarily
    id -25 class hdd        # do not change unnecessarily
    # weight 14.55518
    alg straw2
    hash 0    # rjenkins1
    item pves03 weight 7.27759
    item pves04 weight 7.27759
}
host pves05 {
    id -11        # do not change unnecessarily
    id -12 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.16 weight 1.81940
    item osd.17 weight 1.81940
    item osd.18 weight 1.81940
    item osd.19 weight 1.81940
}
host pves06 {
    id -13        # do not change unnecessarily
    id -14 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.20 weight 1.81940
    item osd.21 weight 1.81940
    item osd.22 weight 1.81940
    item osd.23 weight 1.81940
}
row chasis3 {
    id -21        # do not change unnecessarily
    id -24 class hdd        # do not change unnecessarily
    # weight 14.55518
    alg straw2
    hash 0    # rjenkins1
    item pves05 weight 7.27759
    item pves06 weight 7.27759
}
host pves07 {
    id -15        # do not change unnecessarily
    id -16 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.24 weight 1.81940
    item osd.25 weight 1.81940
    item osd.26 weight 1.81940
    item osd.27 weight 1.81940
}
host pves08 {
    id -17        # do not change unnecessarily
    id -18 class hdd        # do not change unnecessarily
    # weight 7.27759
    alg straw2
    hash 0    # rjenkins1
    item osd.28 weight 1.81940
    item osd.29 weight 1.81940
    item osd.30 weight 1.81940
    item osd.31 weight 1.81940
}
row chasis4 {
    id -22        # do not change unnecessarily
    id -23 class hdd        # do not change unnecessarily
    # weight 14.55518
    alg straw2
    hash 0    # rjenkins1
    item pves07 weight 7.27759
    item pves08 weight 7.27759
}
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    # weight 58.22070
    alg straw2
    hash 0    # rjenkins1
    item chasis1 weight 14.55518
    item chasis2 weight 14.55518
    item chasis3 weight 14.55518
    item chasis4 weight 14.55518
}

# rules
rule host_rule {
    id 0
    type replicated
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
rule chasis_rule {
    id 1
    type replicated
    step take default
    step chooseleaf firstn 0 type row
    step emit
}

# end crush map