pg is stuck inactive for , current state unknown, last acting [] after Upgrade natulius

bonkersdeluxe

Renowned Member
Jan 20, 2014
28
3
68
Hi, ive got a little problem, ceph cluster works but one pg is unknown.
I think ist because the pool health_metric has no osd.

This Ceph cluster exist before the pool health_metric published from installer.
But i mean that the pool exist before the update on a higher version from nautilus.
Ive upgraded to octopus yet.
I think this error occours, because this pool has no OSD.
if i try
ceph pg repair 3.0
then
Error EAGAIN: pg 3.0 has no primary osd
So i think thats the problem, metric pool has no OSD, i think.

I had an nvme pool only. I had two rulesets. the standard and nvme
The Health metric pool has default ruleset.
I dont want insert hdds as OSD only for health_metric.
Must i change the health_pool rule to nvme rule? When yes without data loss from my nvme pool?
Or how can i this fixed?
See screenshots.
In Screenshots Ive got a second waring:
mon.vserver03 has auth_allow_insecure_global_id_reclaim set to true
mon.vserver04 has auth_allow_insecure_global_id_reclaim set to true
mon.vserver05 has auth_allow_insecure_global_id_reclaim set to true
mon.vserver01 has auth_allow_insecure_global_id_reclaim set to true

But this will i solve later or depends my problem on it?
Thank you!
Sincerely Bonkersdeluxe

my Crush map
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class nvme
device 1 osd.1 class nvme
device 2 osd.2 class nvme
device 3 osd.3 class nvme
device 4 osd.4 class nvme
device 5 osd.5 class nvme
device 11 osd.11 class nvme
device 12 osd.12 class nvme

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host vserver01 {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    id -9 class nvme        # do not change unnecessarily
    # weight 11.644
    alg straw2
    hash 0    # rjenkins1
    item osd.4 weight 5.822
    item osd.5 weight 5.822
}
host vserver02 {
    id -5        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    id -10 class nvme        # do not change unnecessarily
    # weight 0.000
    alg straw2
    hash 0    # rjenkins1
}
host vserver03 {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    id -11 class nvme        # do not change unnecessarily
    # weight 5.820
    alg straw2
    hash 0    # rjenkins1
    item osd.11 weight 2.910
    item osd.12 weight 2.910
}
host vserver04 {
    id -13        # do not change unnecessarily
    id -14 class hdd        # do not change unnecessarily
    id -15 class nvme        # do not change unnecessarily
    # weight 11.644
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 5.822
    item osd.1 weight 5.822
}
host vserver05 {
    id -16        # do not change unnecessarily
    id -17 class hdd        # do not change unnecessarily
    id -18 class nvme        # do not change unnecessarily
    # weight 11.644
    alg straw2
    hash 0    # rjenkins1
    item osd.2 weight 5.822
    item osd.3 weight 5.822
}
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    id -12 class nvme        # do not change unnecessarily
    # weight 40.752
    alg straw2
    hash 0    # rjenkins1
    item vserver01 weight 11.644
    item vserver02 weight 0.000
    item vserver03 weight 5.820
    item vserver04 weight 11.644
    item vserver05 weight 11.644
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default class hdd
    step chooseleaf firstn 0 type host
    step emit
}
rule replicated_rule_nvme {
    id 1
    type replicated
    min_size 1
    max_size 10
    step take default class nvme
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map
 

Attachments

  • Screenshot_2021-04-26 vserver02 - Proxmox Virtual Environment(2).png
    Screenshot_2021-04-26 vserver02 - Proxmox Virtual Environment(2).png
    45.9 KB · Views: 71
  • Screenshot_2021-04-26 vserver02 - Proxmox Virtual Environment(1).png
    Screenshot_2021-04-26 vserver02 - Proxmox Virtual Environment(1).png
    18.9 KB · Views: 71
  • Screenshot_2021-04-26 vserver02 - Proxmox Virtual Environment.png
    Screenshot_2021-04-26 vserver02 - Proxmox Virtual Environment.png
    68.3 KB · Views: 72
Last edited:
Hi aaron,
Thank you for reply.
Here are:

Code:
ceph osd lspools
2 nvme
3 device_health_metrics

pg list as attachment

Thank you!
Sincerely Bonkersdeluxe
 

Attachments

Hi, Any Idea?
Or is it an option to recreate the device_health_pool?
Delete and create new. This pool stored only harddrive health informations ,or not?
I dont know how i get this issue, to correct them.
I dont touch anything yet, because after it is more broken as it is.
The Main nvme Pool seems to work. *Puuuuh*
I hope aaron or anybody else can help me, so that the cluster get health_ok again.
Thanky you!
Sincerely Bonkersdeluxe
 
Hey sorry for not getting back at you. I found a bug report which could be your situation: https://tracker.ceph.com/issues/46743

Nonetheless, I think recreating the device_health_metrics pool will be the easiest solution. As you already guessed, it is used to store disk health infos to predict drive failures. That data will be gone if it isn't already.

Delete the pool and I actually explained a few weeks ago how to recreate it and configure it to only have one PG in another thread: https://forum.proxmox.com/threads/delte-ceph-pool-device_health_metrics.87230/#post-382660
 
Hi aaron,
so ive delete the pool.
but no i cant recreate...
ceph device scrape-health-metrics
Error EIO: Module 'devicehealth' has experienced an error and cannot handle commands: [errno 2] RADOS object not found (Failed to operate write op for oid b'SAMSUNG_MZPLL6T4HMLA-00005_S4C6NA0NC00014')

IVE Got a big error Message in Cluster see screenshot.
Must i use the nvme ruleset?

but it seems my nvme pool works... *Puuuh*
thank you!
 

Attachments

  • Screenshot_2021-04-27 vserver02 - Proxmox Virtual Environment.png
    Screenshot_2021-04-27 vserver02 - Proxmox Virtual Environment.png
    56.4 KB · Views: 47
  • Screenshot_2021-04-27 vserver02 - Proxmox Virtual Environment(1).png
    Screenshot_2021-04-27 vserver02 - Proxmox Virtual Environment(1).png
    15.7 KB · Views: 43
1. Delete the pool device_health_metric
2. delete the active manager under monitors
3. wait for an standby manager go active and the recently delete manager disappers in the list.
4. now create manager on the host again were it was recently deleted
5. This manager go standby now, but recreates the metric health pool.
6. ceph osd pool set device_health_metrics pg_num_min 1
7. But its warning unknown pg comes again after 30-60 seconds.
8. now i tried to start
ceph device scrape-health-metrics
it runs very long, seems an endless loop, but i let it runs. may be it ends themselve with Ok?
 
Last edited:
  • Like
Reactions: takeokun and jsterr
I think I have an idea what the problem might be. AFAIK there are only NVME OSDs in the cluster right? At least according to the crush map, those are the OSDs:
Code:
# devices
device 0 osd.0 class nvme
device 1 osd.1 class nvme
device 2 osd.2 class nvme
device 3 osd.3 class nvme
device 4 osd.4 class nvme
device 5 osd.5 class nvme
device 11 osd.11 class nvme
device 12 osd.12 class nvme


I seem that the default "replicated_rule" has been changed to limit it only to OSDs of type HDD:
Code:
# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default class hdd  # << this line here limits to HDD OSDs
    step chooseleaf firstn 0 type host
    step emit
}

And that is most likely the problem. Either change that rule back to not have any limits on the OSD device type, or assign the NVME rule to the device health metrics.

Otherwise there is no OSDs around where the PG can be placed AFAICS.
 
  • Like
Reactions: sigmarb
Hi aaron,
Thank you!
Can i change the ruleset, without data lose from the other main pool?
And when yes, how i change the rule set for device_health_metric?
Thank you!
Sincerely Bonkersdeluxe
 
Can i change the ruleset, without data lose from the other main pool?
should be okay.

If you install the latest updates to PVE 6.4, you will be able to change the assigned rule to the pool via the GUI. Otherwise you can assign a different rule to the pool with
Code:
ceph osd pool set <pool-name> crush_rule <rule-name>
See the Ceph docs.

Interestingly the default "replicated_rule" has the extra step to limit it to HDDs which aren't even present in your cluster if I see correctly.

If you want to remedy that, the quickest way is probably to rename the current "replicated_rule" to "replicated_rule_hdd" and then create a new "replicated_rule" without any device type configured.
 
1. Delete the pool device_health_metric
2. delete the active manager under monitors
3. wait for an standby manager go active and the recently delete manager disappers in the list.
4. now create manager on the host again were it was recently deleted
5. This manager go standby now, but recreates the metric health pool.

This fixed my pg 1.0 num got status unknown. (it got stuck because I recreated cephfs, pools and osds)

Thats how it looked on my site:
Code:
root@pve-b11:~# ceph pg ls | more
PG     OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES     OMAP_BYTES*  OMAP_KEYS*  LOG
STATE         SINCE  VERSION  REPORTED  UP           ACTING       SCRUB_STAMP        
             DEEP_SCRUB_STAMP              
1.0          0         0          0        0         0            0           0    0
     unknown    25m      0'0       0:0        []p-1        []p-1  2021-11-16T16:24:45.
029812+0100  2021-11-16T16:24:45.029812+0100


root@pve-b11:~# ceph osd lspools
1 device_health_metrics
5 vm_nvme
6 cephfs_data
7 cephfs_metadata

After uusing the tip from above device_health_metrics were recreated with id 8 and pg 8.0 was active+clean.
 
I had a similar Issue with my device_health_metrics.

After adding additional OSDs and Server restarts, the device_health_metrics status became unknown (stuck warning).

ceph pg ls | more
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP
DEEP_SCRUB_STAMP
1.0 0 0 0 0 0 0 0 0 unknown 46m 0'0 0:0 []p-1 []p-1 2021-12-23T06:32:28.3
16163+0100 2021-12-23T06:32:28.316163+0100


I have removed the manager and the stuck device_health_metrics and readded/recreated it:

1640242200243.png
ceph osd pool set device_health_metrics pg_num_min 1

image-1640241741967.png

1640242328653.png


So it seams to solve the issue but is it ok to delete the device_health_metrics and createt it manually, anything to worry about?


I am using:
ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)
 
Hey All!

I've just had a similar issue with a 3 node cluster. The inactive PG and the slow ops warning was present from the beginning. The difference is that the pool name was .mgr instead of device_health_metrics. I guess this comes from the different version (reef 18.2.0).
I followed this method (huge thanks), have to take only the first 5 steps. All done in Proxmox GUI.
1. Delete the pool device_health_metric
2. delete the active manager under monitors
3. wait for an standby manager go active and the recently delete manager disappers in the list.
4. now create manager on the host again were it was recently deleted
5. This manager go standby now, but recreates the metric health pool.
6. ceph osd pool set device_health_metrics pg_num_min 1
7. But its warning unknown pg comes again after 30-60 seconds.
8. now i tried to start
ceph device scrape-health-metrics
it runs very long, seems an endless loop, but i let it runs. may be it ends themselve with Ok?



1. I removed the .mgr pool (don't panic by the cluster going to unhealthy state immediately)
2. Removed the active mgr (waited to disappear)
3. Readd the deleted node as mgr.
4. The .mgr pool recreated automagically and the status turned to healthy without any warnings.
 
  • Like
Reactions: jsterr

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!