Ceph fails after power loss: SLOW_OPS, OSDs flip between down and up

engineer5

New Member
Jan 13, 2024
7
0
1
Dear all,

after a short power loss, my cluster of three nodes is not coming up again.
System facts: Proxmox 8.3.3, Ceph Reef 18.2.4, each node 3 SSDs (4TB SATA, 2TB NVME, 1TB SATA with a OSD partition) with 3 OSD, so a total of 9 OSDs.
The system tries to activate the OSDs, then decides that ALL OSDs of a node are too bad and calls them down.
I've tried to disable down-ing of the OSDs, but the system does not show any signs of recovery.

root@f4:~# ceph health
HEALTH_WARN 6 osds down; 2 hosts (6 osds) down; Reduced data availability: 65 pgs inactive, 65 pgs down

Log entries look like this:
2025-02-19T18:10:21.086911+0100 mon.f4 (mon.0) 5579 : cluster 3 Health check update: 44 slow ops, oldest one blocked for 357 sec, daemons [osd.0,osd.5,osd.6,mon.fuji4] have slow ops. (SLOW_OPS)
2025-02-19T18:10:21.087248+0100 mon.f4 (mon.0) 5580 : cluster 1 osd.1 failed (root=default,host=f5) (2 reporters from different host after 326.382033 >= grace 324.756848)
2025-02-19T18:10:21.087292+0100 mon.f4 (mon.0) 5581 : cluster 1 osd.6 failed (root=default,host=f5) (2 reporters from different host after 326.381985 >= grace 325.887730)
2025-02-19T18:10:21.087318+0100 mon.f4 (mon.0) 5582 : cluster 1 osd.8 failed (root=default,host=f5) (2 reporters from different host after 326.381968 >= grace 325.646970)
2025-02-19T18:10:21.087847+0100 mon.f4 (mon.0) 5583 : cluster 3 Health check update: 6 osds down (OSD_DOWN)
2025-02-19T18:10:21.087860+0100 mon.f4 (mon.0) 5584 : cluster 3 Health check update: 2 hosts (6 osds) down (OSD_HOST_DOWN)
2025-02-19T18:10:21.101109+0100 mon.fuji4 (mon.0) 5585 : cluster 0 osdmap e3638: 9 total, 3 up, 9 in
2025-02-19T18:10:21.759021+0100 osd.6 (osd.6) 4647 : cluster 3 2 slow requests (by type [ 'delayed' : 2 ] most affected pool [ 'pool1' : 1 ])
2025-02-19T18:10:22.108540+0100 mon.fuji4 (mon.0) 5588 : cluster 0 osdmap e3639: 9 total, 3 up, 9 in
2025-02-19T18:10:22.457209+0100 mgr.fuji4 (mgr.84034215) 4603 : cluster 0 pgmap v4772: 65 pgs: 26 stale+peering, 39 peering; 184 GiB data, 524 GiB used, 18 TiB / 18 TiB avail

Please give me tips how to debug or solve this!

Thanks,

engineer5
 
Are the time settings of each Cluster member correct, Mons and Manager Up?

Can you provide the follwing Details:

ceph status
ceph osd tree
ceph osd df
ceph pg dump pgs_brief

You can try to:

ceph osd set norecover
ceph osd set nobackfill
ceph osd set noup
ceph osd set nodown

After that

ceph osd unset noup
ceph osd unser nodown

Wait a few a seconds to get the osds Up. If they wont come up, try:

ceph osd out <OSD-ID>
ceph osd in <OSD-ID>
ceph osd up <OSD-ID>

And maybe:

systemctl restart ceph-osd@<OSD-ID>

If that does not Work, get us some more logs:

ceph osd dump | grep -i 'down\|out'
journalctl -u ceph-osd@<OSD-ID> --no-pager --lines=50

And Check Mon logs again
 
Thank you quanto11!

time synchronisation: chrony is running and NTP time is correct.
BUT I found another issue: ip connectivity for the ceph network is not up. This seems to be a different issue, so I am opening another thread for it.

The questioned details:

root@fuji5:~# ceph status
cluster:
id: a812e100-9d63-44d7-89c6-94be137bf94a
health: HEALTH_WARN
Reduced data availability: 65 pgs inactive, 65 pgs peering
63 slow ops, oldest one blocked for 834 sec, daemons [osd.0,osd.5,osd.6,osd.7,mon.fuji4] have slow ops.
services:
mon: 3 daemons, quorum fuji4,fuji6,fuji5 (age 7h)
mgr: fuji4(active, since 7h), standbys: fuji5, fuji6
osd: 9 osds: 9 up (since 4m), 9 in (since 9h)
data:
pools: 3 pools, 65 pgs
objects: 48.24k objects, 184 GiB
usage: 524 GiB used, 18 TiB / 18 TiB avail
pgs: 100.000% pgs not active
65 peering

root@fuji5:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 18.11513 root default
-5 6.03838 host fuji4
0 ssd 3.63869 osd.0 down 1.00000 1.00000
3 ssd 0.58029 osd.3 down 1.00000 1.00000
5 ssd 1.81940 osd.5 down 1.00000 1.00000
-3 6.03838 host fuji5
1 ssd 3.63869 osd.1 up 1.00000 1.00000
6 ssd 1.81940 osd.6 up 1.00000 1.00000
8 ssd 0.58029 osd.8 up 1.00000 1.00000
-7 6.03838 host fuji6
2 ssd 3.63869 osd.2 down 1.00000 1.00000
4 ssd 0.58029 osd.4 down 1.00000 1.00000
7 ssd 1.81940 osd.7 down 1.00000 1.00000

root@fuji5:~# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 ssd 3.63869 1.00000 3.6 TiB 98 GiB 98 GiB 16 KiB 326 MiB 3.5 TiB 2.64 0.93 0 down
3 ssd 0.58029 1.00000 594 GiB 31 GiB 30 GiB 16 KiB 130 MiB 564 GiB 5.14 1.82 0 down
5 ssd 1.81940 1.00000 1.8 TiB 46 GiB 46 GiB 19 KiB 220 MiB 1.8 TiB 2.47 0.87 0 down
1 ssd 3.63869 1.00000 3.6 TiB 110 GiB 110 GiB 24 KiB 324 MiB 3.5 TiB 2.97 1.05 40 up
6 ssd 1.81940 1.00000 1.8 TiB 42 GiB 42 GiB 15 KiB 164 MiB 1.8 TiB 2.26 0.80 16 up
8 ssd 0.58029 1.00000 594 GiB 22 GiB 22 GiB 17 KiB 149 MiB 572 GiB 3.73 1.32 9 up
2 ssd 3.63869 1.00000 3.6 TiB 108 GiB 108 GiB 34 KiB 340 MiB 3.5 TiB 2.91 1.03 0 down
4 ssd 0.58029 1.00000 594 GiB 16 GiB 16 GiB 14 KiB 122 MiB 578 GiB 2.65 0.94 0 down
7 ssd 1.81940 1.00000 1.8 TiB 51 GiB 50 GiB 18 KiB 181 MiB 1.8 TiB 2.71 0.96 0 down
TOTAL 18 TiB 524 GiB 523 GiB 178 KiB 1.9 GiB 18 TiB 2.83
MIN/MAX VAR: 0.80/1.82 STDDEV: 0.86

root@fuji5:~# ceph pg dump pgs_brief
PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
3.1a down [1] 1 [1] 1
2.1b down [8] 8 [8] 8
3.1b down [1] 1 [1] 1
2.1a down [1] 1 [1] 1
2.19 down [6] 6 [6] 6
3.18 down [1] 1 [1] 1
2.18 down [1] 1 [1] 1
3.19 down [1] 1 [1] 1
2.17 down [1] 1 [1] 1
3.16 down [1] 1 [1] 1
2.16 down [1] 1 [1] 1
3.17 down [1] 1 [1] 1
2.15 down [1] 1 [1] 1
3.14 down [6] 6 [6] 6
2.14 down [1] 1 [1] 1
3.15 down [8] 8 [8] 8
2.13 down [1] 1 [1] 1
3.12 down [6] 6 [6] 6
2.12 down [1] 1 [1] 1
3.13 down [8] 8 [8] 8
2.11 down [1] 1 [1] 1
3.10 down [6] 6 [6] 6
2.10 down [1] 1 [1] 1
3.11 down [6] 6 [6] 6
2.f down [6] 6 [6] 6
3.e down [8] 8 [8] 8
2.e down [8] 8 [8] 8
3.f down [1] 1 [1] 1
2.d down [6] 6 [6] 6
3.c down [8] 8 [8] 8
2.c down [1] 1 [1] 1
3.d down [8] 8 [8] 8
1.0 down [6] 6 [6] 6
2.3 down [1] 1 [1] 1
3.2 down [1] 1 [1] 1
2.0 down [6] 6 [6] 6
3.1 down [6] 6 [6] 6
2.1 down [6] 6 [6] 6
3.0 down [1] 1 [1] 1
2.2 down [1] 1 [1] 1
3.3 down [1] 1 [1] 1
2.4 down [1] 1 [1] 1
3.5 down [6] 6 [6] 6
2.5 down [6] 6 [6] 6
3.4 down [1] 1 [1] 1
2.6 down [1] 1 [1] 1
3.7 down [1] 1 [1] 1
2.7 down [6] 6 [6] 6
3.6 down [1] 1 [1] 1
2.8 down [8] 8 [8] 8
3.9 down [1] 1 [1] 1
2.9 down [1] 1 [1] 1
3.8 down [1] 1 [1] 1
2.a down [1] 1 [1] 1
3.b down [1] 1 [1] 1
2.b down [1] 1 [1] 1
3.a down [1] 1 [1] 1
3.1d down [1] 1 [1] 1
2.1c down [1] 1 [1] 1
3.1c down [6] 6 [6] 6
2.1d down [1] 1 [1] 1
3.1f down [1] 1 [1] 1
2.1e down [6] 6 [6] 6
3.1e down [8] 8 [8] 8
2.1f down [1] 1 [1] 1
dumped pgs_brief