Hi.
I have a problem.
I am testing a 7 node cluster.
Each node has 1 nvme (4 OSD) and two hd (2 OSD), so 6 OSD each node.
There are two replication rules (type nvme and hd) and two pools (fast and slow) acording to the rules.
All its ok, but when i shutdown one one, and thereafter another, there is a problem with the timeout of OSD DOWN.
1.- Shutdown node5 :
Then ....
Shutdown node 6
I'll wait 600 seconds to change the state the OSD to down/out and...
One OSD is not marked as Out, therefore the ceph cluster cant heal himself.
what is happening?
I have a problem.
I am testing a 7 node cluster.
Each node has 1 nvme (4 OSD) and two hd (2 OSD), so 6 OSD each node.
There are two replication rules (type nvme and hd) and two pools (fast and slow) acording to the rules.
All its ok, but when i shutdown one one, and thereafter another, there is a problem with the timeout of OSD DOWN.
1.- Shutdown node5 :
ceph -s
cluster:
id: 0c075451-588b-4fe1-87f6-afc711bf5547
health: HEALTH_WARN
6 osds down
1 host (6 osds) down
Degraded data redundancy: 4114/30792 objects degraded (13.361%), 406 pgs degraded, 408 pgs undersized
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3
mgr: ceph1(active), standbys: ceph2, ceph3
osd: 42 osds: 36 up, 42 in
data:
pools: 2 pools, 1024 pgs
objects: 10.26k objects, 38.5GiB
usage: 150GiB used, 96.6TiB / 96.8TiB avail
pgs: 4114/30792 objects degraded (13.361%)
616 active+clean
406 active+undersized+degraded
2 active+undersized
io:
client: 8.99KiB/s wr, 0op/s rd, 1op/s wr
root@ceph1:~# ceph health detail
HEALTH_WARN 6 osds down; 1 host (6 osds) down; Degraded data redundancy: 4114/30792 objects degraded (13.361%), 406 pgs degraded, 408 pgs undersized
OSD_DOWN 6 osds down
osd.16 (root=default,host=ceph5) is down
osd.17 (root=default,host=ceph5) is down
osd.18 (root=default,host=ceph5) is down
osd.19 (root=default,host=ceph5) is down
osd.36 (root=default,host=ceph5) is down
osd.37 (root=default,host=ceph5) is down
OSD_HOST_DOWN 1 host (6 osds) down
host ceph5 (root=default) (6 osds) is down
PG_DEGRADED Degraded data redundancy: 4114/30792 objects degraded (13.361%), 406 pgs degraded, 408 pgs undersized
pg 2.154 is active+undersized+degraded, acting [24,14]
pg 2.158 is stuck undersized for 140.851158, current state active+undersized+degraded, last acting [14,26]
pg 2.15a is stuck undersized for 140.851623, current state active+undersized+degraded, last acting [12,26]
.
.
pg 4.192 is stuck undersized for 140.865229, current state active+undersized+degraded, last acting [38,41]
pg 4.197 is stuck undersized for 140.864811, current state active+undersized+degraded, last acting [38,32]
cluster:
id: 0c075451-588b-4fe1-87f6-afc711bf5547
health: HEALTH_WARN
6 osds down
1 host (6 osds) down
Degraded data redundancy: 4114/30792 objects degraded (13.361%), 406 pgs degraded, 408 pgs undersized
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3
mgr: ceph1(active), standbys: ceph2, ceph3
osd: 42 osds: 36 up, 42 in
data:
pools: 2 pools, 1024 pgs
objects: 10.26k objects, 38.5GiB
usage: 150GiB used, 96.6TiB / 96.8TiB avail
pgs: 4114/30792 objects degraded (13.361%)
616 active+clean
406 active+undersized+degraded
2 active+undersized
io:
client: 8.99KiB/s wr, 0op/s rd, 1op/s wr
root@ceph1:~# ceph health detail
HEALTH_WARN 6 osds down; 1 host (6 osds) down; Degraded data redundancy: 4114/30792 objects degraded (13.361%), 406 pgs degraded, 408 pgs undersized
OSD_DOWN 6 osds down
osd.16 (root=default,host=ceph5) is down
osd.17 (root=default,host=ceph5) is down
osd.18 (root=default,host=ceph5) is down
osd.19 (root=default,host=ceph5) is down
osd.36 (root=default,host=ceph5) is down
osd.37 (root=default,host=ceph5) is down
OSD_HOST_DOWN 1 host (6 osds) down
host ceph5 (root=default) (6 osds) is down
PG_DEGRADED Degraded data redundancy: 4114/30792 objects degraded (13.361%), 406 pgs degraded, 408 pgs undersized
pg 2.154 is active+undersized+degraded, acting [24,14]
pg 2.158 is stuck undersized for 140.851158, current state active+undersized+degraded, last acting [14,26]
pg 2.15a is stuck undersized for 140.851623, current state active+undersized+degraded, last acting [12,26]
.
.
pg 4.192 is stuck undersized for 140.865229, current state active+undersized+degraded, last acting [38,41]
pg 4.197 is stuck undersized for 140.864811, current state active+undersized+degraded, last acting [38,32]
2019-05-13 13:22:40.476280 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257578 : cluster [DBG] pgmap v258151: 1024 pgs: 1024 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 6.27KiB/s wr, 0op/s
2019-05-13 13:22:45.718219 mon.ceph1 mon.0 10.9.4.151:6789/0 208951 : cluster [INF] osd.18 marked itself down
2019-05-13 13:22:45.718330 mon.ceph1 mon.0 10.9.4.151:6789/0 208952 : cluster [INF] osd.17 marked itself down
2019-05-13 13:22:45.718412 mon.ceph1 mon.0 10.9.4.151:6789/0 208953 : cluster [INF] osd.36 marked itself down
2019-05-13 13:22:45.718496 mon.ceph1 mon.0 10.9.4.151:6789/0 208954 : cluster [INF] osd.19 marked itself down
2019-05-13 13:22:45.718578 mon.ceph1 mon.0 10.9.4.151:6789/0 208955 : cluster [INF] osd.37 marked itself down
2019-05-13 13:22:45.718650 mon.ceph1 mon.0 10.9.4.151:6789/0 208956 : cluster [INF] osd.16 marked itself down
2019-05-13 13:22:46.069101 mon.ceph1 mon.0 10.9.4.151:6789/0 208957 : cluster [WRN] Health check failed: 6 osds down (OSD_DOWN)
2019-05-13 13:22:46.069141 mon.ceph1 mon.0 10.9.4.151:6789/0 208958 : cluster [WRN] Health check failed: 1 host (6 osds) down (OSD_HOST_DOWN)
2019-05-13 13:22:46.072693 mon.ceph1 mon.0 10.9.4.151:6789/0 208959 : cluster [DBG] osdmap e1111: 42 total, 36 up, 42 in
2019-05-13 13:22:47.071752 mon.ceph1 mon.0 10.9.4.151:6789/0 208960 : cluster [WRN] Health check failed: Reduced data availability: 9 pgs peering (PG_AVAILABILITY)
2019-05-13 13:22:47.075956 mon.ceph1 mon.0 10.9.4.151:6789/0 208961 : cluster [DBG] osdmap e1112: 42 total, 36 up, 42 in
2019-05-13 13:22:49.075079 mon.ceph1 mon.0 10.9.4.151:6789/0 208962 : cluster [WRN] Health check failed: Degraded data redundancy: 2085/30792 objects degraded (6.771%), 195 pgs degraded (PG_DEGRADED)
.
2019-05-13 13:22:45.718219 mon.ceph1 mon.0 10.9.4.151:6789/0 208951 : cluster [INF] osd.18 marked itself down
2019-05-13 13:22:45.718330 mon.ceph1 mon.0 10.9.4.151:6789/0 208952 : cluster [INF] osd.17 marked itself down
2019-05-13 13:22:45.718412 mon.ceph1 mon.0 10.9.4.151:6789/0 208953 : cluster [INF] osd.36 marked itself down
2019-05-13 13:22:45.718496 mon.ceph1 mon.0 10.9.4.151:6789/0 208954 : cluster [INF] osd.19 marked itself down
2019-05-13 13:22:45.718578 mon.ceph1 mon.0 10.9.4.151:6789/0 208955 : cluster [INF] osd.37 marked itself down
2019-05-13 13:22:45.718650 mon.ceph1 mon.0 10.9.4.151:6789/0 208956 : cluster [INF] osd.16 marked itself down
2019-05-13 13:22:46.069101 mon.ceph1 mon.0 10.9.4.151:6789/0 208957 : cluster [WRN] Health check failed: 6 osds down (OSD_DOWN)
2019-05-13 13:22:46.069141 mon.ceph1 mon.0 10.9.4.151:6789/0 208958 : cluster [WRN] Health check failed: 1 host (6 osds) down (OSD_HOST_DOWN)
2019-05-13 13:22:46.072693 mon.ceph1 mon.0 10.9.4.151:6789/0 208959 : cluster [DBG] osdmap e1111: 42 total, 36 up, 42 in
2019-05-13 13:22:47.071752 mon.ceph1 mon.0 10.9.4.151:6789/0 208960 : cluster [WRN] Health check failed: Reduced data availability: 9 pgs peering (PG_AVAILABILITY)
2019-05-13 13:22:47.075956 mon.ceph1 mon.0 10.9.4.151:6789/0 208961 : cluster [DBG] osdmap e1112: 42 total, 36 up, 42 in
2019-05-13 13:22:49.075079 mon.ceph1 mon.0 10.9.4.151:6789/0 208962 : cluster [WRN] Health check failed: Degraded data redundancy: 2085/30792 objects degraded (6.771%), 195 pgs degraded (PG_DEGRADED)
.
Then ....
Shutdown node 6
root@ceph1:~#
root@ceph1:~# ceph health detail
HEALTH_WARN 12 osds down; 2 hosts (12 osds) down; Reduced data availability: 142 pgs inactive; Degraded data redundancy: 8428/30792 objects degraded (27.371%), 698 pgs degraded, 702 pgs undersized
OSD_DOWN 12 osds down
osd.16 (root=default,host=ceph5) is down
osd.17 (root=default,host=ceph5) is down
osd.18 (root=default,host=ceph5) is down
osd.19 (root=default,host=ceph5) is down
osd.20 (root=default,host=ceph6) is down
osd.21 (root=default,host=ceph6) is down
osd.22 (root=default,host=ceph6) is down
osd.23 (root=default,host=ceph6) is down
osd.36 (root=default,host=ceph5) is down
osd.37 (root=default,host=ceph5) is down
osd.38 (root=default,host=ceph6) is down
osd.39 (root=default,host=ceph6) is down
OSD_HOST_DOWN 2 hosts (12 osds) down
host ceph5 (root=default) (6 osds) is down
host ceph6 (root=default) (6 osds) is down
PG_AVAILABILITY Reduced data availability: 142 pgs inactive
pg 2.59 is stuck inactive for 110.578856, current state undersized+degraded+peered, last acting [2]
pg 2.5c is stuck inactive for 110.578917, current state undersized+degraded+peered, last acting [10]
pg 2.5e is stuck inactive for 110.566629, current state undersized+degraded+peered, last acting [15]
pg 2.5f is stuck inactive for 110.562646, current state undersized+degraded+peered, last acting [12]
pg 2.c8 is stuck inactive for 110.577358, current state undersized+degraded+peered, last acting [0]
pg 2.ca is stuck inactive for 110.576643, current state undersized+degraded+peered, last acting [3]
pg 2.d5 is stuck inactive for 110.573947, current state undersized+degraded+peered, last acting [4]
.
.
pg 4.18e is stuck undersized for 109.581025, current state undersized+degraded+peered, last acting [32]
pg 4.18f is stuck undersized for 109.582313, current state active+undersized, last acting [41,30]
pg 4.190 is stuck undersized for 302.491251, current state active+undersized+degraded, last acting [41,33]
pg 4.191 is stuck undersized for 109.583189, current state active+undersized+degraded, last acting [29,35]
pg 4.192 is stuck undersized for 109.583341, current state undersized+degraded+peered, last acting [41]
pg 4.195 is stuck undersized for 109.582965, current state active+undersized+degraded, last acting [28,32]
pg 4.196 is stuck undersized for 109.581611, current state active+undersized+degraded, last acting [40,35]
pg 4.197 is stuck undersized for 109.581763, current state undersized+degraded+peered, last acting [32]
root@ceph1:~# ceph health detail
HEALTH_WARN 12 osds down; 2 hosts (12 osds) down; Reduced data availability: 142 pgs inactive; Degraded data redundancy: 8428/30792 objects degraded (27.371%), 698 pgs degraded, 702 pgs undersized
OSD_DOWN 12 osds down
osd.16 (root=default,host=ceph5) is down
osd.17 (root=default,host=ceph5) is down
osd.18 (root=default,host=ceph5) is down
osd.19 (root=default,host=ceph5) is down
osd.20 (root=default,host=ceph6) is down
osd.21 (root=default,host=ceph6) is down
osd.22 (root=default,host=ceph6) is down
osd.23 (root=default,host=ceph6) is down
osd.36 (root=default,host=ceph5) is down
osd.37 (root=default,host=ceph5) is down
osd.38 (root=default,host=ceph6) is down
osd.39 (root=default,host=ceph6) is down
OSD_HOST_DOWN 2 hosts (12 osds) down
host ceph5 (root=default) (6 osds) is down
host ceph6 (root=default) (6 osds) is down
PG_AVAILABILITY Reduced data availability: 142 pgs inactive
pg 2.59 is stuck inactive for 110.578856, current state undersized+degraded+peered, last acting [2]
pg 2.5c is stuck inactive for 110.578917, current state undersized+degraded+peered, last acting [10]
pg 2.5e is stuck inactive for 110.566629, current state undersized+degraded+peered, last acting [15]
pg 2.5f is stuck inactive for 110.562646, current state undersized+degraded+peered, last acting [12]
pg 2.c8 is stuck inactive for 110.577358, current state undersized+degraded+peered, last acting [0]
pg 2.ca is stuck inactive for 110.576643, current state undersized+degraded+peered, last acting [3]
pg 2.d5 is stuck inactive for 110.573947, current state undersized+degraded+peered, last acting [4]
.
.
pg 4.18e is stuck undersized for 109.581025, current state undersized+degraded+peered, last acting [32]
pg 4.18f is stuck undersized for 109.582313, current state active+undersized, last acting [41,30]
pg 4.190 is stuck undersized for 302.491251, current state active+undersized+degraded, last acting [41,33]
pg 4.191 is stuck undersized for 109.583189, current state active+undersized+degraded, last acting [29,35]
pg 4.192 is stuck undersized for 109.583341, current state undersized+degraded+peered, last acting [41]
pg 4.195 is stuck undersized for 109.582965, current state active+undersized+degraded, last acting [28,32]
pg 4.196 is stuck undersized for 109.581611, current state active+undersized+degraded, last acting [40,35]
pg 4.197 is stuck undersized for 109.581763, current state undersized+degraded+peered, last acting [32]
2019-05-13 13:25:48.348583 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257672 : cluster [DBG] pgmap v258246: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 10.1KiB/s wr, 1op/s; 4114/30792 objects degraded (13.361%)
2019-05-13 13:25:50.368330 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257673 : cluster [DBG] pgmap v258247: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8.17KiB/s wr, 0op/s; 4114/30792 objects degraded (13.361%)
2019-05-13 13:25:58.656936 mon.ceph1 mon.0 10.9.4.151:6789/0 209025 : cluster [INF] osd.20 marked itself down
2019-05-13 13:25:58.657012 mon.ceph1 mon.0 10.9.4.151:6789/0 209026 : cluster [INF] osd.38 marked itself down
2019-05-13 13:25:58.657159 mon.ceph1 mon.0 10.9.4.151:6789/0 209027 : cluster [INF] osd.22 marked itself down
2019-05-13 13:25:58.657310 mon.ceph1 mon.0 10.9.4.151:6789/0 209028 : cluster [INF] osd.39 marked itself down
2019-05-13 13:25:58.657397 mon.ceph1 mon.0 10.9.4.151:6789/0 209029 : cluster [INF] osd.23 marked itself down
2019-05-13 13:25:58.657474 mon.ceph1 mon.0 10.9.4.151:6789/0 209030 : cluster [INF] osd.21 marked itself down
2019-05-13 13:25:58.977765 mon.ceph1 mon.0 10.9.4.151:6789/0 209031 : cluster [WRN] Health check update: 12 osds down (OSD_DOWN)
2019-05-13 13:25:58.977794 mon.ceph1 mon.0 10.9.4.151:6789/0 209032 : cluster [WRN] Health check update: 2 hosts (12 osds) down (OSD_HOST_DOWN)
2019-05-13 13:25:58.981551 mon.ceph1 mon.0 10.9.4.151:6789/0 209033 : cluster [DBG] osdmap e1113: 42 total, 30 up, 42 in
2019-05-13 13:25:59.983837 mon.ceph1 mon.0 10.9.4.151:6789/0 209034 : cluster [DBG] osdmap e1114: 42 total, 30 up, 42 in
2019-05-13 13:25:52.388265 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257674 : cluster [DBG] pgmap v258248: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 5.61KiB/s wr, 0op/s; 4114/30792 objects degraded (13.361%)
2019-05-13 13:25:54.409055 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257675 : cluster [DBG] pgmap v258249: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 5.61KiB/s wr, 0op/s; 4114/30792 objects degraded (13.361%)
2019-05-13 13:25:56.428113 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257676 : cluster [DBG] pgmap v258250: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 4114/30792 objects degraded (13.361%)
2019-05-13 13:25:58.448644 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257677 : cluster [DBG] pgmap v258251: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 675B/s wr, 0op/s; 4114/30792 objects degraded (13.361%)
2019-05-13 13:26:00.467771 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257678 : cluster [DBG] pgmap v258254: 1024 pgs: 24 undersized+degraded+peered, 20 peering, 57 stale+active+undersized+degraded, 70 stale+active+clean, 2 active+undersized, 364 active+undersized+degraded, 487 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 1013B/s wr, 0op/s; 4468/30792 objects degraded (14.510%); 42.1MiB/s, 10objects/s recovering
2019-05-13 13:26:00.983204 mon.ceph1 mon.0 10.9.4.151:6789/0 209036 : cluster [WRN] Health check failed: Reduced data availability: 1 pg inactive, 20 pgs peering (PG_AVAILABILITY)
2019-05-13 13:26:00.983233 mon.ceph1 mon.0 10.9.4.151:6789/0 209037 : cluster [WRN] Health check update: Degraded data redundancy: 4468/30792 objects degraded (14.510%), 445 pgs degraded, 381 pgs undersized (PG_DEGRADED)
2019-05-13 13:26:06.019246 mon.ceph1 mon.0 10.9.4.151:6789/0 209041 : cluster [WRN] Health check update: Reduced data availability: 1 pg inactive (PG_AVAILABILITY)
2019-05-13 13:26:06.019286 mon.ceph1 mon.0 10.9.4.151:6789/0 209042 : cluster [WRN] Health check update: Degraded data redundancy: 8428/30792 objects degraded (27.371%), 698 pgs degraded, 266 pgs undersized (PG_DEGRADED)
2019-05-13 13:26:02.488142 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257679 : cluster [DBG] pgmap v258255: 1024 pgs: 67 undersized+degraded+peered, 20 peering, 41 stale+active+undersized+degraded, 53 stale+active+clean, 2 active+undersized, 415 active+undersized+degraded, 426 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 1013B/s wr, 0op/s; 6055/30792 objects degraded (19.664%); 79.2MiB/s, 23objects/s recovering
2019-05-13 13:26:04.508895 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257680 : cluster [DBG] pgmap v258256: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 2.60KiB/s rd, 1013B/s wr, 0op/s; 8428/30792 objects degraded (27.371%); 244MiB/s, 89objects/s recovering
2019-05-13 13:25:50.368330 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257673 : cluster [DBG] pgmap v258247: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8.17KiB/s wr, 0op/s; 4114/30792 objects degraded (13.361%)
2019-05-13 13:25:58.656936 mon.ceph1 mon.0 10.9.4.151:6789/0 209025 : cluster [INF] osd.20 marked itself down
2019-05-13 13:25:58.657012 mon.ceph1 mon.0 10.9.4.151:6789/0 209026 : cluster [INF] osd.38 marked itself down
2019-05-13 13:25:58.657159 mon.ceph1 mon.0 10.9.4.151:6789/0 209027 : cluster [INF] osd.22 marked itself down
2019-05-13 13:25:58.657310 mon.ceph1 mon.0 10.9.4.151:6789/0 209028 : cluster [INF] osd.39 marked itself down
2019-05-13 13:25:58.657397 mon.ceph1 mon.0 10.9.4.151:6789/0 209029 : cluster [INF] osd.23 marked itself down
2019-05-13 13:25:58.657474 mon.ceph1 mon.0 10.9.4.151:6789/0 209030 : cluster [INF] osd.21 marked itself down
2019-05-13 13:25:58.977765 mon.ceph1 mon.0 10.9.4.151:6789/0 209031 : cluster [WRN] Health check update: 12 osds down (OSD_DOWN)
2019-05-13 13:25:58.977794 mon.ceph1 mon.0 10.9.4.151:6789/0 209032 : cluster [WRN] Health check update: 2 hosts (12 osds) down (OSD_HOST_DOWN)
2019-05-13 13:25:58.981551 mon.ceph1 mon.0 10.9.4.151:6789/0 209033 : cluster [DBG] osdmap e1113: 42 total, 30 up, 42 in
2019-05-13 13:25:59.983837 mon.ceph1 mon.0 10.9.4.151:6789/0 209034 : cluster [DBG] osdmap e1114: 42 total, 30 up, 42 in
2019-05-13 13:25:52.388265 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257674 : cluster [DBG] pgmap v258248: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 5.61KiB/s wr, 0op/s; 4114/30792 objects degraded (13.361%)
2019-05-13 13:25:54.409055 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257675 : cluster [DBG] pgmap v258249: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 5.61KiB/s wr, 0op/s; 4114/30792 objects degraded (13.361%)
2019-05-13 13:25:56.428113 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257676 : cluster [DBG] pgmap v258250: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 4114/30792 objects degraded (13.361%)
2019-05-13 13:25:58.448644 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257677 : cluster [DBG] pgmap v258251: 1024 pgs: 2 active+undersized, 406 active+undersized+degraded, 616 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 675B/s wr, 0op/s; 4114/30792 objects degraded (13.361%)
2019-05-13 13:26:00.467771 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257678 : cluster [DBG] pgmap v258254: 1024 pgs: 24 undersized+degraded+peered, 20 peering, 57 stale+active+undersized+degraded, 70 stale+active+clean, 2 active+undersized, 364 active+undersized+degraded, 487 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 1013B/s wr, 0op/s; 4468/30792 objects degraded (14.510%); 42.1MiB/s, 10objects/s recovering
2019-05-13 13:26:00.983204 mon.ceph1 mon.0 10.9.4.151:6789/0 209036 : cluster [WRN] Health check failed: Reduced data availability: 1 pg inactive, 20 pgs peering (PG_AVAILABILITY)
2019-05-13 13:26:00.983233 mon.ceph1 mon.0 10.9.4.151:6789/0 209037 : cluster [WRN] Health check update: Degraded data redundancy: 4468/30792 objects degraded (14.510%), 445 pgs degraded, 381 pgs undersized (PG_DEGRADED)
2019-05-13 13:26:06.019246 mon.ceph1 mon.0 10.9.4.151:6789/0 209041 : cluster [WRN] Health check update: Reduced data availability: 1 pg inactive (PG_AVAILABILITY)
2019-05-13 13:26:06.019286 mon.ceph1 mon.0 10.9.4.151:6789/0 209042 : cluster [WRN] Health check update: Degraded data redundancy: 8428/30792 objects degraded (27.371%), 698 pgs degraded, 266 pgs undersized (PG_DEGRADED)
2019-05-13 13:26:02.488142 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257679 : cluster [DBG] pgmap v258255: 1024 pgs: 67 undersized+degraded+peered, 20 peering, 41 stale+active+undersized+degraded, 53 stale+active+clean, 2 active+undersized, 415 active+undersized+degraded, 426 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 1013B/s wr, 0op/s; 6055/30792 objects degraded (19.664%); 79.2MiB/s, 23objects/s recovering
2019-05-13 13:26:04.508895 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257680 : cluster [DBG] pgmap v258256: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 2.60KiB/s rd, 1013B/s wr, 0op/s; 8428/30792 objects degraded (27.371%); 244MiB/s, 89objects/s recovering
I'll wait 600 seconds to change the state the OSD to down/out and...
2019-05-13 13:32:36.460367 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257876 : cluster [DBG] pgmap v258450: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8428/30792 objects degraded (27.371%)
2019-05-13 13:32:38.480383 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257877 : cluster [DBG] pgmap v258451: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8428/30792 objects degraded (27.371%)
2019-05-13 13:32:40.500555 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257878 : cluster [DBG] pgmap v258452: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8428/30792 objects degraded (27.371%)
2019-05-13 13:32:51.054709 mon.ceph1 mon.0 10.9.4.151:6789/0 209168 : cluster [INF] Marking osd.16 out (has been down for 604 seconds)
2019-05-13 13:32:51.054741 mon.ceph1 mon.0 10.9.4.151:6789/0 209169 : cluster [INF] Marking osd.17 out (has been down for 604 seconds)
2019-05-13 13:32:51.054757 mon.ceph1 mon.0 10.9.4.151:6789/0 209170 : cluster [INF] Marking osd.18 out (has been down for 604 seconds)
2019-05-13 13:32:51.054775 mon.ceph1 mon.0 10.9.4.151:6789/0 209171 : cluster [INF] Marking osd.19 out (has been down for 604 seconds)
2019-05-13 13:32:51.054807 mon.ceph1 mon.0 10.9.4.151:6789/0 209172 : cluster [INF] Marking osd.36 out (has been down for 604 seconds)
2019-05-13 13:32:51.054836 mon.ceph1 mon.0 10.9.4.151:6789/0 209173 : cluster [INF] Marking osd.37 out (has been down for 604 seconds)
2019-05-13 13:32:51.055163 mon.ceph1 mon.0 10.9.4.151:6789/0 209174 : cluster [WRN] Health check update: 6 osds down (OSD_DOWN)
2019-05-13 13:32:51.055208 mon.ceph1 mon.0 10.9.4.151:6789/0 209175 : cluster [WRN] Health check update: 1 host (6 osds) down (OSD_HOST_DOWN)
2019-05-13 13:32:51.058001 mon.ceph1 mon.0 10.9.4.151:6789/0 209176 : cluster [DBG] osdmap e1115: 42 total, 30 up, 36 in
2019-05-13 13:32:42.520137 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257879 : cluster [DBG] pgmap v258453: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8428/30792 objects degraded (27.371%)
2019-05-13 13:32:44.540692 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257880 : cluster [DBG] pgmap v258454: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8428/30792 objects degraded (27.371%)
.
.
2019-05-13 13:32:50.600844 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257883 : cluster [DBG] pgmap v258457: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 1.65KiB/s wr, 0op/s; 8428/30792 objects degraded (27.371%)
2019-05-13 13:32:52.065481 mon.ceph1 mon.0 10.9.4.151:6789/0 209177 : cluster [DBG] osdmap e1116: 42 total, 30 up, 36 in
2019-05-13 13:32:53.061404 mon.ceph1 mon.0 10.9.4.151:6789/0 209178 : cluster [WRN] Health check update: Reduced data availability: 141 pgs inactive, 64 pgs peering (PG_AVAILABILITY)
2019-05-13 13:32:53.061439 mon.ceph1 mon.0 10.9.4.151:6789/0 209179 : cluster [WRN] Health check update: Degraded data redundancy: 7316/30792 objects degraded (23.759%), 635 pgs degraded, 631 pgs undersized (PG_DEGRADED)
2019-05-13 13:32:53.064856 mon.ceph1 mon.0 10.9.4.151:6789/0 209180 : cluster [DBG] osdmap e1117: 42 total, 30 up, 36 in
2019-05-13 13:32:53.067049 osd.28 osd.28 10.9.4.151:6808/105992 132 : cluster [DBG] 4.62 starting backfill to osd.33 from (0'0,0'0] MAX to 831'15783
2019-05-13 13:32:54.071026 mon.ceph1 mon.0 10.9.4.151:6789/0 209183 : cluster [DBG] osdmap e1118: 42 total, 30 up, 36 in
2019-05-13 13:32:55.074069 mon.ceph1 mon.0 10.9.4.151:6789/0 209185 : cluster [DBG] osdmap e1119: 42 total, 30 up, 36 in
2019-05-13 13:32:56.060946 mon.ceph1 mon.0 10.9.4.151:6789/0 209187 : cluster [DBG] osdmap e1120: 42 total, 30 up, 36 in
..
.
..
2019-05-13 13:35:40.284707 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257967 : cluster [DBG] pgmap v258559: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:42.304090 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257968 : cluster [DBG] pgmap v258560: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:44.324942 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257969 : cluster [DBG] pgmap v258561: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:46.343996 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257970 : cluster [DBG] pgmap v258562: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:48.364475 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257971 : cluster [DBG] pgmap v258563: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:50.384458 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257972 : cluster [DBG] pgmap v258564: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:36:01.077919 mon.ceph1 mon.0 10.9.4.151:6789/0 209272 : cluster [INF] Marking osd.20 out (has been down for 602 seconds)
2019-05-13 13:36:01.077979 mon.ceph1 mon.0 10.9.4.151:6789/0 209273 : cluster [INF] Marking osd.21 out (has been down for 602 seconds)
2019-05-13 13:36:01.078003 mon.ceph1 mon.0 10.9.4.151:6789/0 209274 : cluster [INF] Marking osd.22 out (has been down for 602 seconds)
2019-05-13 13:36:01.078024 mon.ceph1 mon.0 10.9.4.151:6789/0 209275 : cluster [INF] Marking osd.23 out (has been down for 602 seconds)
2019-05-13 13:36:01.078046 mon.ceph1 mon.0 10.9.4.151:6789/0 209276 : cluster [INF] Marking osd.38 out (has been down for 602 seconds)
2019-05-13 13:36:01.078384 mon.ceph1 mon.0 10.9.4.151:6789/0 209277 : cluster [WRN] Health check update: 1 osds down (OSD_DOWN)
2019-05-13 13:36:01.081014 mon.ceph1 mon.0 10.9.4.151:6789/0 209278 : cluster [DBG] osdmap e1133: 42 total, 30 up, 31 in
2019-05-13 13:35:52.404134 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257973 : cluster [DBG] pgmap v258565: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:54.425006 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257974 : cluster [DBG] pgmap v258566: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:56.444033 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257975 : cluster [DBG] pgmap v258567: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:58.464737 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257976 : cluster [DBG] pgmap v258568: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 5048/30792 objects degraded (16.394%)
2019-05-13 13:36:00.484412 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257977 : cluster [DBG] pgmap v258569: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 5048/30792 objects degraded (16.394%)
2019-05-13 13:36:02.089092 mon.ceph1 mon.0 10.9.4.151:6789/0 209279 : cluster [DBG] osdmap e1134: 42 total, 30 up, 31 in
2019-05-13 13:36:03.084613 mon.ceph1 mon.0 10.9.4.151:6789/0 209280 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs inactive, 91 pgs peering (PG_AVAILABILITY)
2019-05-13 13:36:03.084649 mon.ceph1 mon.0 10.9.4.151:6789/0 209281 : cluster [WRN] Health check update: Degraded data redundancy: 3824/30792 objects degraded (12.419%), 403 pgs degraded, 400 pgs undersized (PG_DEGRADED)
2019-05-13 13:36:03.088670 mon.ceph1 mon.0 10.9.4.151:6789/0 209282 : cluster [DBG] osdmap e1135: 42 total, 30 up, 31 in
2019-05-13 13:36:03.089757 osd.32 osd.32 10.9.4.153:6808/4154258 146 : cluster [DBG] 4.1dd starting backfill to osd.31 from (0'0,0'0] MAX to 821'28280
2019-05-13 13:36:04.095136 mon.ceph1 mon.0 10.9.4.151:6789/0 209284 : cluster [DBG] osdmap e1136: 42 total, 30 up, 31 in
2019-05-13 13:36:05.097844 mon.ceph1 mon.0 10.9.4.151:6789/0 209285 : cluster [DBG] osdmap e1137: 42 total, 30 up, 31 in
2019-05-13 13:36:06.084515 mon.ceph1 mon.0 10.9.4.151:6789/0 209287 : cluster [DBG] osdmap e1138: 42 total, 30 up, 31 in
2019-05-13 13:36:03.090345 osd.4 osd.4 10.9.4.152:6800/3952 266 : cluster [DBG] 2.ae starting backfill to osd.13 from (0'0,0'0] MAX to 1062'7381
2019-05-13 13:36:03.090402 osd.4 osd.4 10.9.4.152:6800/3952 267 : cluster [DBG] 2.1f0 starting backfill to osd.0 from (0'0,0'0] MAX to 1062'9978
.
.
2019-05-13 13:36:03.099754 osd.4 osd.4 10.9.4.152:6800/3952 273 : cluster [DBG] 2.80 starting backfill to osd.10 from (0'0,0'0] MAX to 871'24739
2019-05-13 13:36:03.101848 osd.4 osd.4 10.9.4.152:6800/3952 274 : cluster [DBG] 2.d5 starting backfill to osd.15 from (0'0,0'0] MAX to 898'19101
.
.
2019-05-13 13:54:01.215522 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 258513 : cluster [DBG] pgmap v259130: 1024 pgs: 1 active+undersized, 147 active+undersized+degraded, 876 active+clean; 38.5GiB data, 138GiB used, 74.5TiB / 74.6TiB avail; 727/30792 objects degraded (2.361%)
2019-05-13 13:54:03.236190 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 258514 : cluster [DBG] pgmap v259131: 1024 pgs: 1 active+undersized, 147 active+undersized+degraded, 876 active+clean; 38.5GiB data, 138GiB used, 74.5TiB / 74.6TiB avail; 1.32KiB/s wr, 0op/s; 727/30792 objects degraded (2.361%)
.
2019-05-13 13:54:09.300982 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 258517 : cluster [DBG] pgmap v259134: 1024 pgs: 1 active+undersized, 147 active+undersized+degraded, 876 active+clean; 38.5GiB data, 138GiB used, 74.5TiB / 74.6TiB avail; 9.75KiB/s wr, 1op/s; 727/30792 objects degraded (2.361%)
2019-05-13 13:54:11.324119 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 258518 : cluster [DBG] pgmap v259135: 1024 pgs: 1 active+undersized, 147 active+undersized+degraded, 876 active+clean; 38.5GiB data, 138GiB used, 74.5TiB / 74.6TiB avail; 9.74KiB/s wr, 1op/s; 727/30792 objects degraded (2.361%)
2019-05-13 13:32:38.480383 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257877 : cluster [DBG] pgmap v258451: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8428/30792 objects degraded (27.371%)
2019-05-13 13:32:40.500555 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257878 : cluster [DBG] pgmap v258452: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8428/30792 objects degraded (27.371%)
2019-05-13 13:32:51.054709 mon.ceph1 mon.0 10.9.4.151:6789/0 209168 : cluster [INF] Marking osd.16 out (has been down for 604 seconds)
2019-05-13 13:32:51.054741 mon.ceph1 mon.0 10.9.4.151:6789/0 209169 : cluster [INF] Marking osd.17 out (has been down for 604 seconds)
2019-05-13 13:32:51.054757 mon.ceph1 mon.0 10.9.4.151:6789/0 209170 : cluster [INF] Marking osd.18 out (has been down for 604 seconds)
2019-05-13 13:32:51.054775 mon.ceph1 mon.0 10.9.4.151:6789/0 209171 : cluster [INF] Marking osd.19 out (has been down for 604 seconds)
2019-05-13 13:32:51.054807 mon.ceph1 mon.0 10.9.4.151:6789/0 209172 : cluster [INF] Marking osd.36 out (has been down for 604 seconds)
2019-05-13 13:32:51.054836 mon.ceph1 mon.0 10.9.4.151:6789/0 209173 : cluster [INF] Marking osd.37 out (has been down for 604 seconds)
2019-05-13 13:32:51.055163 mon.ceph1 mon.0 10.9.4.151:6789/0 209174 : cluster [WRN] Health check update: 6 osds down (OSD_DOWN)
2019-05-13 13:32:51.055208 mon.ceph1 mon.0 10.9.4.151:6789/0 209175 : cluster [WRN] Health check update: 1 host (6 osds) down (OSD_HOST_DOWN)
2019-05-13 13:32:51.058001 mon.ceph1 mon.0 10.9.4.151:6789/0 209176 : cluster [DBG] osdmap e1115: 42 total, 30 up, 36 in
2019-05-13 13:32:42.520137 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257879 : cluster [DBG] pgmap v258453: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8428/30792 objects degraded (27.371%)
2019-05-13 13:32:44.540692 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257880 : cluster [DBG] pgmap v258454: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 8428/30792 objects degraded (27.371%)
.
.
2019-05-13 13:32:50.600844 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257883 : cluster [DBG] pgmap v258457: 1024 pgs: 1 undersized+peered, 141 undersized+degraded+peered, 3 active+undersized, 557 active+undersized+degraded, 322 active+clean; 38.5GiB data, 150GiB used, 96.6TiB / 96.8TiB avail; 1.65KiB/s wr, 0op/s; 8428/30792 objects degraded (27.371%)
2019-05-13 13:32:52.065481 mon.ceph1 mon.0 10.9.4.151:6789/0 209177 : cluster [DBG] osdmap e1116: 42 total, 30 up, 36 in
2019-05-13 13:32:53.061404 mon.ceph1 mon.0 10.9.4.151:6789/0 209178 : cluster [WRN] Health check update: Reduced data availability: 141 pgs inactive, 64 pgs peering (PG_AVAILABILITY)
2019-05-13 13:32:53.061439 mon.ceph1 mon.0 10.9.4.151:6789/0 209179 : cluster [WRN] Health check update: Degraded data redundancy: 7316/30792 objects degraded (23.759%), 635 pgs degraded, 631 pgs undersized (PG_DEGRADED)
2019-05-13 13:32:53.064856 mon.ceph1 mon.0 10.9.4.151:6789/0 209180 : cluster [DBG] osdmap e1117: 42 total, 30 up, 36 in
2019-05-13 13:32:53.067049 osd.28 osd.28 10.9.4.151:6808/105992 132 : cluster [DBG] 4.62 starting backfill to osd.33 from (0'0,0'0] MAX to 831'15783
2019-05-13 13:32:54.071026 mon.ceph1 mon.0 10.9.4.151:6789/0 209183 : cluster [DBG] osdmap e1118: 42 total, 30 up, 36 in
2019-05-13 13:32:55.074069 mon.ceph1 mon.0 10.9.4.151:6789/0 209185 : cluster [DBG] osdmap e1119: 42 total, 30 up, 36 in
2019-05-13 13:32:56.060946 mon.ceph1 mon.0 10.9.4.151:6789/0 209187 : cluster [DBG] osdmap e1120: 42 total, 30 up, 36 in
..
.
..
2019-05-13 13:35:40.284707 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257967 : cluster [DBG] pgmap v258559: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:42.304090 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257968 : cluster [DBG] pgmap v258560: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:44.324942 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257969 : cluster [DBG] pgmap v258561: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:46.343996 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257970 : cluster [DBG] pgmap v258562: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:48.364475 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257971 : cluster [DBG] pgmap v258563: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:50.384458 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257972 : cluster [DBG] pgmap v258564: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:36:01.077919 mon.ceph1 mon.0 10.9.4.151:6789/0 209272 : cluster [INF] Marking osd.20 out (has been down for 602 seconds)
2019-05-13 13:36:01.077979 mon.ceph1 mon.0 10.9.4.151:6789/0 209273 : cluster [INF] Marking osd.21 out (has been down for 602 seconds)
2019-05-13 13:36:01.078003 mon.ceph1 mon.0 10.9.4.151:6789/0 209274 : cluster [INF] Marking osd.22 out (has been down for 602 seconds)
2019-05-13 13:36:01.078024 mon.ceph1 mon.0 10.9.4.151:6789/0 209275 : cluster [INF] Marking osd.23 out (has been down for 602 seconds)
2019-05-13 13:36:01.078046 mon.ceph1 mon.0 10.9.4.151:6789/0 209276 : cluster [INF] Marking osd.38 out (has been down for 602 seconds)
2019-05-13 13:36:01.078384 mon.ceph1 mon.0 10.9.4.151:6789/0 209277 : cluster [WRN] Health check update: 1 osds down (OSD_DOWN)
2019-05-13 13:36:01.081014 mon.ceph1 mon.0 10.9.4.151:6789/0 209278 : cluster [DBG] osdmap e1133: 42 total, 30 up, 31 in
2019-05-13 13:35:52.404134 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257973 : cluster [DBG] pgmap v258565: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:54.425006 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257974 : cluster [DBG] pgmap v258566: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 675B/s wr, 0op/s; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:56.444033 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257975 : cluster [DBG] pgmap v258567: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 5048/30792 objects degraded (16.394%)
2019-05-13 13:35:58.464737 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257976 : cluster [DBG] pgmap v258568: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 5048/30792 objects degraded (16.394%)
2019-05-13 13:36:00.484412 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 257977 : cluster [DBG] pgmap v258569: 1024 pgs: 3 active+undersized, 496 active+undersized+degraded, 525 active+clean; 38.5GiB data, 141GiB used, 82.8TiB / 83.0TiB avail; 5048/30792 objects degraded (16.394%)
2019-05-13 13:36:02.089092 mon.ceph1 mon.0 10.9.4.151:6789/0 209279 : cluster [DBG] osdmap e1134: 42 total, 30 up, 31 in
2019-05-13 13:36:03.084613 mon.ceph1 mon.0 10.9.4.151:6789/0 209280 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs inactive, 91 pgs peering (PG_AVAILABILITY)
2019-05-13 13:36:03.084649 mon.ceph1 mon.0 10.9.4.151:6789/0 209281 : cluster [WRN] Health check update: Degraded data redundancy: 3824/30792 objects degraded (12.419%), 403 pgs degraded, 400 pgs undersized (PG_DEGRADED)
2019-05-13 13:36:03.088670 mon.ceph1 mon.0 10.9.4.151:6789/0 209282 : cluster [DBG] osdmap e1135: 42 total, 30 up, 31 in
2019-05-13 13:36:03.089757 osd.32 osd.32 10.9.4.153:6808/4154258 146 : cluster [DBG] 4.1dd starting backfill to osd.31 from (0'0,0'0] MAX to 821'28280
2019-05-13 13:36:04.095136 mon.ceph1 mon.0 10.9.4.151:6789/0 209284 : cluster [DBG] osdmap e1136: 42 total, 30 up, 31 in
2019-05-13 13:36:05.097844 mon.ceph1 mon.0 10.9.4.151:6789/0 209285 : cluster [DBG] osdmap e1137: 42 total, 30 up, 31 in
2019-05-13 13:36:06.084515 mon.ceph1 mon.0 10.9.4.151:6789/0 209287 : cluster [DBG] osdmap e1138: 42 total, 30 up, 31 in
2019-05-13 13:36:03.090345 osd.4 osd.4 10.9.4.152:6800/3952 266 : cluster [DBG] 2.ae starting backfill to osd.13 from (0'0,0'0] MAX to 1062'7381
2019-05-13 13:36:03.090402 osd.4 osd.4 10.9.4.152:6800/3952 267 : cluster [DBG] 2.1f0 starting backfill to osd.0 from (0'0,0'0] MAX to 1062'9978
.
.
2019-05-13 13:36:03.099754 osd.4 osd.4 10.9.4.152:6800/3952 273 : cluster [DBG] 2.80 starting backfill to osd.10 from (0'0,0'0] MAX to 871'24739
2019-05-13 13:36:03.101848 osd.4 osd.4 10.9.4.152:6800/3952 274 : cluster [DBG] 2.d5 starting backfill to osd.15 from (0'0,0'0] MAX to 898'19101
.
.
2019-05-13 13:54:01.215522 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 258513 : cluster [DBG] pgmap v259130: 1024 pgs: 1 active+undersized, 147 active+undersized+degraded, 876 active+clean; 38.5GiB data, 138GiB used, 74.5TiB / 74.6TiB avail; 727/30792 objects degraded (2.361%)
2019-05-13 13:54:03.236190 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 258514 : cluster [DBG] pgmap v259131: 1024 pgs: 1 active+undersized, 147 active+undersized+degraded, 876 active+clean; 38.5GiB data, 138GiB used, 74.5TiB / 74.6TiB avail; 1.32KiB/s wr, 0op/s; 727/30792 objects degraded (2.361%)
.
2019-05-13 13:54:09.300982 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 258517 : cluster [DBG] pgmap v259134: 1024 pgs: 1 active+undersized, 147 active+undersized+degraded, 876 active+clean; 38.5GiB data, 138GiB used, 74.5TiB / 74.6TiB avail; 9.75KiB/s wr, 1op/s; 727/30792 objects degraded (2.361%)
2019-05-13 13:54:11.324119 mgr.ceph1 client.668780 10.9.4.151:0/3141912278 258518 : cluster [DBG] pgmap v259135: 1024 pgs: 1 active+undersized, 147 active+undersized+degraded, 876 active+clean; 38.5GiB data, 138GiB used, 74.5TiB / 74.6TiB avail; 9.74KiB/s wr, 1op/s; 727/30792 objects degraded (2.361%)
One OSD is not marked as Out, therefore the ceph cluster cant heal himself.
ceph -s
cluster:
id: 0c075451-588b-4fe1-87f6-afc711bf5547
health: HEALTH_WARN
1 osds down
1 host (6 osds) down
Degraded data redundancy: 727/30792 objects degraded (2.361%), 147 pgs degraded, 148 pgs undersized
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3
mgr: ceph1(active), standbys: ceph2, ceph3
osd: 42 osds: 30 up, 31 in
data:
pools: 2 pools, 1024 pgs
objects: 10.26k objects, 38.5GiB
usage: 138GiB used, 74.5TiB / 74.6TiB avail
pgs: 727/30792 objects degraded (2.361%)
876 active+clean
147 active+undersized+degraded
1 active+undersized
root@ceph1:~# ceph health detail
HEALTH_WARN 1 osds down; 1 host (6 osds) down; Degraded data redundancy: 727/30792 objects degraded (2.361%), 147 pgs degraded, 148 pgs undersized
OSD_DOWN 1 osds down
osd.39 (root=default,host=ceph6) is down
OSD_HOST_DOWN 1 host (6 osds) down
host ceph6 (root=default) (6 osds) is down
PG_DEGRADED Degraded data redundancy: 727/30792 objects degraded (2.361%), 147 pgs degraded, 148 pgs undersized
pg 4.100 is stuck undersized for 1717.653091, current state active+undersized+degraded, last acting [32,29]
pg 4.101 is stuck undersized for 2129.731013, current state active+undersized+degraded, last acting [29,32]
pg 4.103 is active+undersized+degraded, acting [40,28]
pg 4.109 is stuck undersized for 2129.730081, current state active+undersized+degraded, last acting [32,31]
pg 4.10d is stuck undersized for 2129.730966, current state active+undersized+degraded, last acting [40,31]
pg 4.10e is stuck undersized for 2129.730888, current state active+undersized+degraded, last acting [41,31]
pg 4.112 is stuck undersized for 2129.729025, current state active+undersized+degraded, last acting [29,31]
.
.
pg 4.18f is stuck undersized for 2129.730459, current state active+undersized, last acting [41,30]
pg 4.192 is stuck undersized for 1717.653652, current state active+undersized+degraded, last acting [41,31]
pg 4.195 is stuck undersized for 2129.731111, current state active+undersized+degraded, last acting [28,32]
pg 4.196 is stuck undersized for 2129.729758, current state active+undersized+degraded, last acting [40,35]
pg 4.197 is stuck undersized for 1717.651790, current state active+undersized+degraded, last acting [32,31]
cluster:
id: 0c075451-588b-4fe1-87f6-afc711bf5547
health: HEALTH_WARN
1 osds down
1 host (6 osds) down
Degraded data redundancy: 727/30792 objects degraded (2.361%), 147 pgs degraded, 148 pgs undersized
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3
mgr: ceph1(active), standbys: ceph2, ceph3
osd: 42 osds: 30 up, 31 in
data:
pools: 2 pools, 1024 pgs
objects: 10.26k objects, 38.5GiB
usage: 138GiB used, 74.5TiB / 74.6TiB avail
pgs: 727/30792 objects degraded (2.361%)
876 active+clean
147 active+undersized+degraded
1 active+undersized
root@ceph1:~# ceph health detail
HEALTH_WARN 1 osds down; 1 host (6 osds) down; Degraded data redundancy: 727/30792 objects degraded (2.361%), 147 pgs degraded, 148 pgs undersized
OSD_DOWN 1 osds down
osd.39 (root=default,host=ceph6) is down
OSD_HOST_DOWN 1 host (6 osds) down
host ceph6 (root=default) (6 osds) is down
PG_DEGRADED Degraded data redundancy: 727/30792 objects degraded (2.361%), 147 pgs degraded, 148 pgs undersized
pg 4.100 is stuck undersized for 1717.653091, current state active+undersized+degraded, last acting [32,29]
pg 4.101 is stuck undersized for 2129.731013, current state active+undersized+degraded, last acting [29,32]
pg 4.103 is active+undersized+degraded, acting [40,28]
pg 4.109 is stuck undersized for 2129.730081, current state active+undersized+degraded, last acting [32,31]
pg 4.10d is stuck undersized for 2129.730966, current state active+undersized+degraded, last acting [40,31]
pg 4.10e is stuck undersized for 2129.730888, current state active+undersized+degraded, last acting [41,31]
pg 4.112 is stuck undersized for 2129.729025, current state active+undersized+degraded, last acting [29,31]
.
.
pg 4.18f is stuck undersized for 2129.730459, current state active+undersized, last acting [41,30]
pg 4.192 is stuck undersized for 1717.653652, current state active+undersized+degraded, last acting [41,31]
pg 4.195 is stuck undersized for 2129.731111, current state active+undersized+degraded, last acting [28,32]
pg 4.196 is stuck undersized for 2129.729758, current state active+undersized+degraded, last acting [40,35]
pg 4.197 is stuck undersized for 1717.651790, current state active+undersized+degraded, last acting [32,31]
what is happening?
Last edited: