SOLVED - CEPH - strange pool: "device_health_metrics", one pg without osd

high_performer

Active Member
Jul 16, 2018
15
4
43
I've got a strange situation in my ceph-cluster.

Running 3 Mons version 14.2.9.

I dont know where this pool is from:
device_health_metrics


Health:
Reduced data availability: 1 pg inactivepg 39.0 is stuck inactive for 506396.988991, current state unknown, last acting []


ceph health detail
HEALTH_WARN Reduced data availability: 1 pg inactive
PG_AVAILABILITY Reduced data availability: 1 pg inactive
pg 39.0 is stuck inactive for 506651.226685, current state unknown, last acting []


ceph osd lspools
37 ceph_nvme
38 ceph_ssd
39 device_health_metrics


ceph osd pool get device_health_metrics pg_num
pg_num: 1

ceph pg stat
1281 pgs: 1 unknown, 1280 active+clean; 407 GiB data, 1.2 TiB used, 130 TiB / 132 TiB avail; 1.3 KiB/s rd, 1.6 MiB/s wr, 16 op/s


ceph pg ls
39.0 0 0 0 0 0 0 0 0 unknown 5d 0'0 0:0 []p-1 []p-1 2020-06-03 16:33:46.284025 2020-06-03 16:33:46.284025

ceph pg repair 39.0
Error EAGAIN: pg 39.0 has no primary osd


pveversion
pve-manager/6.2-4/9824574a (running kernel: 5.4.34-1-pve)

I don't know how to fix it.
 
Can you elaborate on that "missing stanza"? I do face the same issue, after having to shut down one of m Ceph nodes, to move it physically to a new location. After bringing up all OSD and bringing them into the pool again, every pg synched up, but one didn't:

Code:
root@iceph03-oh1c:~# ceph health detail
HEALTH_WARN Reduced data availability: 1 pg inactive
PG_AVAILABILITY Reduced data availability: 1 pg inactive
    pg 4.1 is stuck inactive for 6887.066179, current state activating+remapped, last acting [2,1,12]
 
Soo… for me it was enough to restart the affected OSD, which made the error go away.