Hello Forum,
we do have an issue with a new ceph cluster running on three identical nodes. Everything seems to work fine, but the status on the web console complains:
And:
On one of the servers the output of ceph -s is:
ceph pg dump | grep unknown lists this:
From another post I learned that ceph pg 1.0 mark_unfound_lost delete might help, but:
Can you please advise how to solve this issue?
Thank you and best regards,
Nico
we do have an issue with a new ceph cluster running on three identical nodes. Everything seems to work fine, but the status on the web console complains:
And:
ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 87.32867 - 87 TiB 20 TiB 20 TiB 158 MiB 37 GiB 67 TiB 23.23 1.00 - root default
-7 29.10956 - 29 TiB 6.8 TiB 6.8 TiB 53 MiB 12 GiB 22 TiB 23.23 1.00 - host delbgpm01
8 hdd 7.27739 1.00000 7.3 TiB 1.6 TiB 1.6 TiB 13 MiB 2.8 GiB 5.7 TiB 21.76 0.94 65 up osd.8
9 hdd 7.27739 1.00000 7.3 TiB 1.8 TiB 1.8 TiB 14 MiB 3.0 GiB 5.5 TiB 24.43 1.05 79 up osd.9
10 hdd 7.27739 1.00000 7.3 TiB 1.9 TiB 1.9 TiB 14 MiB 3.6 GiB 5.4 TiB 25.87 1.11 76 up osd.10
11 hdd 7.27739 1.00000 7.3 TiB 1.5 TiB 1.5 TiB 13 MiB 2.6 GiB 5.8 TiB 20.88 0.90 68 up osd.11
-3 29.10956 - 29 TiB 6.8 TiB 6.8 TiB 53 MiB 12 GiB 22 TiB 23.23 1.00 - host delbgpm02
0 hdd 7.27739 1.00000 7.3 TiB 1.9 TiB 1.9 TiB 7.9 MiB 3.4 GiB 5.4 TiB 26.08 1.12 81 up osd.0
1 hdd 7.27739 1.00000 7.3 TiB 1.9 TiB 1.9 TiB 22 MiB 3.2 GiB 5.4 TiB 25.56 1.10 83 up osd.1
2 hdd 7.27739 1.00000 7.3 TiB 1.3 TiB 1.3 TiB 2.1 MiB 2.5 GiB 6.0 TiB 17.60 0.76 49 up osd.2
3 hdd 7.27739 1.00000 7.3 TiB 1.7 TiB 1.7 TiB 21 MiB 3.2 GiB 5.6 TiB 23.70 1.02 75 up osd.3
-5 29.10956 - 29 TiB 6.8 TiB 6.8 TiB 52 MiB 12 GiB 22 TiB 23.23 1.00 - host delbgpm03
4 hdd 7.27739 1.00000 7.3 TiB 1.7 TiB 1.7 TiB 13 MiB 3.1 GiB 5.6 TiB 22.80 0.98 74 up osd.4
5 hdd 7.27739 1.00000 7.3 TiB 1.5 TiB 1.5 TiB 12 MiB 2.7 GiB 5.7 TiB 21.27 0.92 65 up osd.5
6 hdd 7.27739 1.00000 7.3 TiB 1.5 TiB 1.5 TiB 5.8 MiB 2.8 GiB 5.7 TiB 21.28 0.92 65 up osd.6
7 hdd 7.27739 1.00000 7.3 TiB 2.0 TiB 2.0 TiB 21 MiB 3.8 GiB 5.3 TiB 27.59 1.19 84 up osd.7
TOTAL 87 TiB 20 TiB 20 TiB 158 MiB 37 GiB 67 TiB 23.23
MIN/MAX VAR: 0.76/1.19 STDDEV: 2.71
On one of the servers the output of ceph -s is:
ceph -s
cluster:
id: 73703df0-b8ae-4aca-8269-1fb68da2142d
health: HEALTH_WARN
Reduced data availability: 1 pg inactive
108 slow ops, oldest one blocked for 96500 sec, osd.7 has slow ops
services:
mon: 3 daemons, quorum delbgpm02,delbgpm03,delbgpm01 (age 105m)
mgr: delbgpm03(active, since 8d), standbys: delbgpm02, delbgpm01
mds: cephfs:1 {0=delbgpm02=up:active} 2 up:standby
osd: 12 osds: 12 up (since 104m), 12 in (since 104m)
data:
pools: 4 pools, 289 pgs
objects: 1.82M objects, 6.8 TiB
usage: 20 TiB used, 67 TiB / 87 TiB avail
pgs: 0.346% pgs unknown
288 active+clean
1 unknown
io:
client: 341 B/s rd, 4.9 MiB/s wr, 0 op/s rd, 32 op/s wr
progress:
Rebalancing after osd.5 marked in (8d)
[............................]
Rebalancing after osd.1 marked in (8d)
[............................]
Rebalancing after osd.7 marked in (8d)
[............................]
Rebalancing after osd.6 marked in (8d)
[............................]
PG autoscaler decreasing pool 3 PGs from 128 to 32 (8d)
[............................]
ceph pg dump | grep unknown lists this:
ceph pg dump |grep unknown
dumped all
1.0 0 0 0 0 0 0 0 0 0 0 unknown 2021-02-16T17:05:09.104825+0100 0'0 0:0 [] -1 [] -1 0'0 2021-02-16T17:05:09.104825+0100 0'0 2021-02-16T17:05:09.104825+0100 0
From another post I learned that ceph pg 1.0 mark_unfound_lost delete might help, but:
ceph pg 1.0 mark_unfound_lost delete
Error ENOENT: i don't have pgid 1.0
Can you please advise how to solve this issue?
Thank you and best regards,
Nico