[SOLVED] All OSD disk down on node after reboot

Bran-Ko

Active Member
Jul 31, 2019
28
4
43
Slovakia, Zilina
Hi I have 4 node CEPH cluster. After installed latest patch I rebooted all noedes (one by one) when CEPH was reballanced (green status).
My CEPH version is 19.2.2 stable. But after last node reboot all disk stay down.

ceph -v
INI:
 ceph version 19.2.2 (72a09a98429da13daae8e462abda408dc163ff75) squid (stable)

ceph -s
INI:
  cluster:

    id:     71bc13b0-e73f-4db8-8d09-d68ffdd1c306

    health: HEALTH_WARN

            4 osds down

            1 host (4 osds) down

            Degraded data redundancy: 434207/1792488 objects degraded (24.224%), 186 pgs degraded, 186 pgs undersized


  services:

    mon: 4 daemons, quorum pkx1,pkx2,pkx3,pkx4 (age 12m)

    mgr: pkx1(active, since 90m), standbys: pkx4, pkx2, pkx3

    osd: 16 osds: 12 up (since 7m), 16 in (since 44m); 2 remapped pgs


  data:

    pools:   2 pools, 257 pgs

    objects: 597.50k objects, 2.3 TiB

    usage:   6.7 TiB used, 21 TiB / 28 TiB avail

    pgs:     434207/1792488 objects degraded (24.224%)

             2285/1792488 objects misplaced (0.127%)

             186 active+undersized+degraded

             69  active+clean

             2   active+clean+remapped


  io:

    client:   582 MiB/s rd, 11 MiB/s wr, 217 op/s rd, 1.02k op/s wr

ceph health detail
INI:
HEALTH_WARN 4 osds down; 1 host (4 osds) down; Degraded data redundancy: 434208/1792491 objects degraded (24.224%), 186 pgs degraded, 186 pgs undersized
[WRN] OSD_DOWN: 4 osds down
    osd.15 (root=default,host=pkx3) is down
    osd.16 (root=default,host=pkx3) is down
    osd.17 (root=default,host=pkx3) is down
    osd.18 (root=default,host=pkx3) is down
[WRN] OSD_HOST_DOWN: 1 host (4 osds) down
    host pkx3 (root=default) (4 osds) is down
[WRN] PG_DEGRADED: Degraded data redundancy: 434208/1792491 objects degraded (24.224%), 186 pgs degraded, 186 pgs undersized
    pg 3.b7 is active+undersized+degraded, acting [20,3]
    pg 3.b8 is stuck undersized for 8m, current state active+undersized+degraded, last acting [19,5]
    pg 3.b9 is stuck undersized for 8m, current state active+undersized+degraded, last acting [7,20]
    pg 3.ba is stuck undersized for 8m, current state active+undersized+degraded, last acting [1,20]
    pg 3.bb is stuck undersized for 8m, current state active+undersized+degraded, last acting [20,4]
    pg 3.bc is stuck undersized for 8m, current state active+undersized+degraded, last acting [20,0]
    pg 3.bd is stuck undersized for 8m, current state active+undersized+degraded, last acting [5,21]
    pg 3.be is stuck undersized for 8m, current state active+undersized+degraded, last acting [2,5]
    pg 3.bf is stuck undersized for 8m, current state active+undersized+degraded, last acting [2,4]
    pg 3.c0 is stuck undersized for 8m, current state active+undersized+degraded, last acting [21,1]
    pg 3.c3 is stuck undersized for 8m, current state active+undersized+degraded, last acting [21,1]
    pg 3.c4 is stuck undersized for 8m, current state active+undersized+degraded, last acting [22,4]
    pg 3.c5 is stuck undersized for 8m, current state active+undersized+degraded, last acting [5,0]
    pg 3.c6 is stuck undersized for 8m, current state active+undersized+degraded, last acting [5,1]
    pg 3.c7 is stuck undersized for 8m, current state active+undersized+degraded, last acting [3,4]
    pg 3.c8 is stuck undersized for 8m, current state active+undersized+degraded, last acting [3,6]
    pg 3.c9 is stuck undersized for 8m, current state active+undersized+degraded, last acting [2,6]
    pg 3.ca is stuck undersized for 8m, current state active+undersized+degraded, last acting [21,5]
    pg 3.cb is stuck undersized for 8m, current state active+undersized+degraded, last acting [2,22]
    pg 3.cc is stuck undersized for 8m, current state active+undersized+degraded, last acting [1,21]
    pg 3.cd is stuck undersized for 8m, current state active+undersized+degraded, last acting [22,1]
    pg 3.cf is stuck undersized for 8m, current state active+undersized+degraded, last acting [7,3]
    pg 3.d0 is stuck undersized for 8m, current state active+undersized+degraded, last acting [2,5]
    pg 3.d4 is stuck undersized for 8m, current state active+undersized+degraded, last acting [1,22]
    pg 3.d5 is stuck undersized for 8m, current state active+undersized+degraded, last acting [3,21]
    pg 3.d6 is stuck undersized for 8m, current state active+undersized+degraded, last acting [7,22]
    pg 3.d7 is stuck undersized for 8m, current state active+undersized+degraded, last acting [6,1]
    pg 3.d9 is stuck undersized for 8m, current state active+undersized+degraded, last acting [6,3]
    pg 3.db is stuck undersized for 8m, current state active+undersized+degraded, last acting [1,20]
    pg 3.dd is stuck undersized for 8m, current state active+undersized+degraded, last acting [20,7]
    pg 3.de is stuck undersized for 8m, current state active+undersized+degraded, last acting [4,1]
    pg 3.df is stuck undersized for 8m, current state active+undersized+degraded, last acting [4,19]
    pg 3.e0 is stuck undersized for 8m, current state active+undersized+degraded, last acting [5,22]
    pg 3.e2 is stuck undersized for 8m, current state active+undersized+degraded, last acting [3,7]
    pg 3.e5 is stuck undersized for 8m, current state active+undersized+degraded, last acting [19,0]
    pg 3.e6 is stuck undersized for 8m, current state active+undersized+degraded, last acting [6,0]
    pg 3.e8 is stuck undersized for 8m, current state active+undersized+degraded, last acting [6,20]
    pg 3.e9 is stuck undersized for 8m, current state active+undersized+degraded, last acting [21,7]
    pg 3.ea is stuck undersized for 8m, current state active+undersized+degraded, last acting [20,6]
    pg 3.eb is stuck undersized for 8m, current state active+undersized+degraded, last acting [4,1]
    pg 3.ec is stuck undersized for 8m, current state active+undersized+degraded, last acting [19,3]
    pg 3.f0 is stuck undersized for 8m, current state active+undersized+degraded, last acting [5,20]
    pg 3.f1 is stuck undersized for 8m, current state active+undersized+degraded, last acting [3,19]
    pg 3.f2 is stuck undersized for 8m, current state active+undersized+degraded, last acting [19,6]
    pg 3.f3 is stuck undersized for 8m, current state active+undersized+degraded, last acting [1,4]
    pg 3.f5 is stuck undersized for 8m, current state active+undersized+degraded, last acting [0,6]
    pg 3.f7 is stuck undersized for 8m, current state active+undersized+degraded, last acting [1,20]
    pg 3.f8 is stuck undersized for 8m, current state active+undersized+degraded, last acting [6,3]
    pg 3.f9 is stuck undersized for 8m, current state active+undersized+degraded, last acting [5,2]
    pg 3.fa is stuck undersized for 8m, current state active+undersized+degraded, last acting [2,22]
    pg 3.fd is stuck undersized for 8m, current state active+undersized+degraded, last acting [6,19]
 
The disk are periodically up and truing to recover, but unsuccesfull. This situation is repeating every few minutes.
During this repairing is performace very poor.
 

Attachments

  • Screenshot from 2025-06-25 22-40-47.png
    Screenshot from 2025-06-25 22-40-47.png
    91.1 KB · Views: 3
this issue was caused by a slow NIC speed (CEPH private) on affected node

after command

ethtool -s enp3s0f0 speed 10000 duplex full autoneg on

it seems that disk working normally
 
  • Like
Reactions: leesteken