Thank you !
After this day, don;t have any misplaced or degraded objects, but still having inactive pgs...
Here's ceph -s , and ceph health detail :
cluster:
id: 2806fcbd-4c9a-4805-a16a-10c01f3a9f32
health: HEALTH_WARN
1 filesystem is degraded
2 nearfull osd(s)
3 pool(s) nearfull
Reduced data availability: 146 pgs inactive
Degraded data redundancy: 146 pgs unclean, 2 pgs degraded, 2 pgs undersized
125 slow requests are blocked > 32 sec
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3
mgr: ceph1(active), standbys: ceph2, ceph3
mds: cephfs-1/1/1 up {0=ceph1=up:replay}, 5 up:standby
osd: 35 osds: 35 up, 35 in; 146 remapped pgs
rgw: 1 daemon active
data:
pools: 20 pools, 4712 pgs
objects: 2745k objects, 7965 GB
usage: 16207 GB used, 64641 GB / 80849 GB avail
pgs: 3.098% pgs not active
4563 active+clean
144 activating+remapped
3 active+clean+scrubbing+deep
2 activating+undersized+degraded+remapped
HEALTH_WARN 1 filesystem is degraded; 2 nearfull osd(s); 3 pool(s) nearfull; Reduced data availability: 146 pgs inactive; Degraded data redundancy: 146 pgs unclean, 2 pgs degraded, 2 pgs undersized; 128 slow requests are blocked > 32 sec
FS_DEGRADED 1 filesystem is degraded
fs cephfs is degraded
OSD_NEARFULL 2 nearfull osd(s)
osd.0 is near full
osd.11 is near full
POOL_NEARFULL 3 pool(s) nearfull
pool 'ssd' is nearfull
pool 'ssd-cache' is nearfull
pool 'ssd-rbd-cache-tier' is nearfull
PG_AVAILABILITY Reduced data availability: 146 pgs inactive
pg 9.ee is stuck inactive for 102626.123983, current state activating+remapped, last acting [9,7]
pg 9.f1 is stuck inactive for 1193.857985, current state activating+remapped, last acting [15,7]
pg 9.fa is stuck inactive for 218178.607448, current state activating+remapped, last acting [4,0]
pg 9.fe is stuck inactive for 210726.399967, current state activating+remapped, last acting [11,15]
pg 9.103 is stuck inactive for 210739.610827, current state activating+remapped, last acting [7,0]
pg 9.104 is stuck inactive for 210739.015627, current state activating+remapped, last acting [9,7]
pg 9.11a is stuck inactive for 624.434796, current state activating+remapped, last acting [11,0]
pg 9.128 is stuck inactive for 112940.576951, current state activating+remapped, last acting [9,15]
pg 9.134 is stuck inactive for 113226.251556, current state activating+remapped, last acting [8,15]
pg 9.135 is stuck inactive for 218178.601220, current state activating+remapped, last acting [11,0]
pg 9.149 is stuck inactive for 210725.347952, current state activating+remapped, last acting [7,0]
pg 9.14a is stuck inactive for 1193.869481, current state activating+remapped, last acting [4,15]
pg 9.17e is stuck inactive for 210726.442806, current state activating+remapped, last acting [11,7]
pg 9.367 is stuck inactive for 218178.607225, current state activating+remapped, last acting [4,0]
pg 9.36c is stuck inactive for 218178.600639, current state activating+remapped, last acting [11,0]
pg 9.374 is stuck inactive for 219391.536161, current state activating+remapped, last acting [4,7]
pg 9.380 is stuck inactive for 210726.418927, current state activating+remapped, last acting [11,15]
pg 9.38f is stuck inactive for 1463.030972, current state activating+remapped, last acting [11,0]
pg 9.39a is stuck inactive for 218178.600384, current state activating+remapped, last acting [11,7]
pg 9.39f is stuck inactive for 1193.855600, current state activating+remapped, last acting [15,11]
pg 9.3c3 is stuck inactive for 1424.918955, current state activating+remapped, last acting [4,15]
pg 9.3ca is stuck inactive for 624.416282, current state activating+remapped, last acting [7,0]
pg 9.3cc is stuck inactive for 1463.028072, current state activating+remapped, last acting [9,0]
pg 9.3f1 is stuck inactive for 218178.605606, current state activating+remapped, last acting [11,0]
pg 9.3f3 is stuck inactive for 405.165697, current state activating+remapped, last acting [9,0]
pg 9.3fb is stuck inactive for 112942.524109, current state activating+remapped, last acting [15,0]
pg 28.ff is stuck inactive for 1193.880132, current state activating+remapped, last acting [11,0]
pg 28.116 is stuck inactive for 101417.642993, current state activating+remapped, last acting [4,0]
pg 28.133 is stuck inactive for 1193.889357, current state activating+remapped, last acting [0,7]
pg 28.134 is stuck inactive for 1432.492816, current state activating+undersized+degraded+remapped, last acting [15]
pg 28.13d is stuck inactive for 22101.394011, current state activating+remapped, last acting [8,7]
pg 28.144 is stuck inactive for 22101.420516, current state activating+remapped, last acting [15,0]
pg 28.155 is stuck inactive for 405.169897, current state activating+remapped, last acting [9,7]
pg 28.168 is stuck inactive for 624.422422, current state activating+remapped, last acting [8,0]
pg 28.17e is stuck inactive for 22101.378316, current state activating+remapped, last acting [11,15]
pg 36.d9 is stuck inactive for 405.178117, current state activating+remapped, last acting [4,7]
pg 36.105 is stuck inactive for 1393.486750, current state activating+remapped, last acting [15,0]
pg 36.114 is stuck inactive for 1193.871946, current state activating+remapped, last acting [4,7]
pg 36.117 is stuck inactive for 1393.492775, current state activating+remapped, last acting [4,0]
pg 36.129 is stuck inactive for 405.076725, current state activating+remapped, last acting [0,11]
pg 36.130 is stuck inactive for 624.426808, current state activating+remapped, last acting [8,0]
pg 36.131 is stuck inactive for 405.177552, current state activating+remapped, last acting [15,0]
pg 36.136 is stuck inactive for 101743.327120, current state activating+remapped, last acting [0,15]
pg 36.13e is stuck inactive for 1193.890073, current state activating+remapped, last acting [0,7]
pg 36.13f is stuck inactive for 1424.892730, current state activating+remapped, last acting [7,15]
pg 36.153 is stuck inactive for 405.173311, current state activating+remapped, last acting [4,0]
pg 36.159 is stuck inactive for 101396.209942, current state activating+remapped, last acting [11,0]
pg 36.160 is stuck inactive for 1393.426864, current state activating+remapped, last acting [11,7]
pg 36.17b is stuck inactive for 1393.481245, current state activating+remapped, last acting [8,7]
pg 36.1ab is stuck inactive for 22101.415712, current state activating+remapped, last acting [15,7]
pg 36.1bc is stuck inactive for 1393.510830, current state activating+remapped, last acting [0,15]
PG_DEGRADED Degraded data redundancy: 146 pgs unclean, 2 pgs degraded, 2 pgs undersized
pg 9.ee is stuck unclean for 263186.736601, current state activating+remapped, last acting [9,7]
pg 9.f1 is stuck unclean for 1232.075228, current state activating+remapped, last acting [15,7]
pg 9.fa is stuck unclean for 218187.482691, current state activating+remapped, last acting [4,0]
pg 9.fe is stuck unclean for 211078.638283, current state activating+remapped, last acting [11,15]
pg 9.103 is stuck unclean for 211078.576099, current state activating+remapped, last acting [7,0]
pg 9.104 is stuck unclean for 218187.446112, current state activating+remapped, last acting [9,7]
pg 9.11a is stuck unclean for 885.357762, current state activating+remapped, last acting [11,0]
pg 9.128 is stuck unclean for 218186.905347, current state activating+remapped, last acting [9,15]
pg 9.134 is stuck unclean for 218186.904735, current state activating+remapped, last acting [8,15]
pg 9.135 is stuck unclean for 263186.938380, current state activating+remapped, last acting [11,0]
pg 9.149 is stuck unclean for 210726.019333, current state activating+remapped, last acting [7,0]
pg 9.14a is stuck unclean for 1232.108681, current state activating+remapped, last acting [4,15]
pg 9.17e is stuck unclean for 211078.569813, current state activating+remapped, last acting [11,7]
pg 9.367 is stuck unclean for 218187.482669, current state activating+remapped, last acting [4,0]
pg 9.36c is stuck unclean for 218186.366552, current state activating+remapped, last acting [11,0]
pg 9.374 is stuck unclean for 263186.683670, current state activating+remapped, last acting [4,7]
pg 9.380 is stuck unclean for 211078.638361, current state activating+remapped, last acting [11,15]
pg 9.38f is stuck unclean for 129631.428616, current state activating+remapped, last acting [11,0]
pg 9.39a is stuck unclean for 263187.247026, current state activating+remapped, last acting [11,7]
pg 9.39f is stuck unclean for 1232.107587, current state activating+remapped, last acting [15,11]
pg 9.3c3 is stuck unclean for 1427.232275, current state activating+remapped, last acting [4,15]
pg 9.3ca is stuck unclean for 885.341385, current state activating+remapped, last acting [7,0]
pg 9.3cc is stuck unclean for 129795.556351, current state activating+remapped, last acting [9,0]
pg 9.3f1 is stuck unclean for 263310.102011, current state activating+remapped, last acting [11,0]
pg 9.3f3 is stuck unclean for 450.640274, current state activating+remapped, last acting [9,0]
pg 9.3fb is stuck unclean for 113249.680510, current state activating+remapped, last acting [15,0]
pg 28.ff is stuck unclean for 1232.116647, current state activating+remapped, last acting [11,0]
pg 28.116 is stuck unclean for 101753.086129, current state activating+remapped, last acting [4,0]
pg 28.133 is stuck unclean for 1232.079792, current state activating+remapped, last acting [0,7]
pg 28.134 is stuck undersized for 1422.827578, current state activating+undersized+degraded+remapped, last acting [15]
pg 28.13d is stuck unclean for 22118.071035, current state activating+remapped, last acting [8,7]
pg 28.144 is stuck unclean for 22118.118083, current state activating+remapped, last acting [15,0]
pg 28.155 is stuck unclean for 450.638796, current state activating+remapped, last acting [9,7]
pg 28.168 is stuck unclean for 884.748122, current state activating+remapped, last acting [8,0]
pg 28.17e is stuck unclean for 22118.106433, current state activating+remapped, last acting [11,15]
pg 36.d9 is stuck unclean for 450.639292, current state activating+remapped, last acting [4,7]
pg 36.105 is stuck unclean for 1418.458061, current state activating+remapped, last acting [15,0]
pg 36.114 is stuck unclean for 1227.587347, current state activating+remapped, last acting [4,7]
pg 36.117 is stuck unclean for 1418.458722, current state activating+remapped, last acting [4,0]
pg 36.129 is stuck unclean for 450.664938, current state activating+remapped, last acting [0,11]
pg 36.130 is stuck unclean for 884.753949, current state activating+remapped, last acting [8,0]
pg 36.131 is stuck unclean for 450.666741, current state activating+remapped, last acting [15,0]
pg 36.136 is stuck unclean for 101752.052148, current state activating+remapped, last acting [0,15]
pg 36.13e is stuck unclean for 1232.079795, current state activating+remapped, last acting [0,7]
pg 36.13f is stuck unclean for 1427.250237, current state activating+remapped, last acting [7,15]
pg 36.153 is stuck unclean for 450.665627, current state activating+remapped, last acting [4,0]
pg 36.159 is stuck unclean for 101742.260251, current state activating+remapped, last acting [11,0]
pg 36.160 is stuck unclean for 1461.940216, current state activating+remapped, last acting [11,7]
pg 36.17b is stuck unclean for 1418.448654, current state activating+remapped, last acting [8,7]
pg 36.1ab is stuck unclean for 22118.116503, current state activating+remapped, last acting [15,7]
pg 36.1bc is stuck unclean for 1418.468201, current state activating+remapped, last acting [0,15]
REQUEST_SLOW 128 slow requests are blocked > 32 sec
4 ops are blocked > 2097.15 sec
21 ops are blocked > 1048.58 sec
55 ops are blocked > 524.288 sec
26 ops are blocked > 262.144 sec
12 ops are blocked > 131.072 sec
7 ops are blocked > 65.536 sec
3 ops are blocked > 32.768 sec
osd.8 has blocked requests > 524.288 sec
osds 9,11 have blocked requests > 2097.15 sec
And also last lines of osd.8 logs:
2018-02-11 15:45:14.539988 7f2f3f4c8700 0 log_channel(cluster) log [WRN] : slow request 30.863142 seconds old, received at 2018-02-11 15:44:43.676547: osd_op(client.91747622.1:17154 36.ffa0bf6 36:6fd05ff0:::rbd_data.579a042ae8944a.0000000000000420:head [read 0~4096] snapc 0=[] RETRY=6 ack+retry+read+known_if_redirected e186495) currently waiting for peered
2018-02-11 15:45:44.546252 7f2f3f4c8700 0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for > 60.871917 secs
2018-02-11 15:45:44.546285 7f2f3f4c8700 0 log_channel(cluster) log [WRN] : slow request 60.871917 seconds old, received at 2018-02-11 15:44:43.674240: osd_op(mds.0.3616:2 9.20b 9.3270c60b (undecoded) ondisk+retry+read+known_if_redirected+full_force e186495) currently waiting for peered
2018-02-11 15:45:44.546297 7f2f3f4c8700 0 log_channel(cluster) log [WRN] : slow request 60.869610 seconds old, received at 2018-02-11 15:44:43.676547: osd_op(client.91747622.1:17154 36.ffa0bf6 36:6fd05ff0:::rbd_data.579a042ae8944a.0000000000000420:head [read 0~4096] snapc 0=[] RETRY=6 ack+retry+read+known_if_redirected e186495) currently waiting for peered
2018-02-11 15:46:44.558848 7f2f3f4c8700 0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for > 120.884482 secs
2018-02-11 15:46:44.558886 7f2f3f4c8700 0 log_channel(cluster) log [WRN] : slow request 120.884482 seconds old, received at 2018-02-11 15:44:43.674240: osd_op(mds.0.3616:2 9.20b 9.3270c60b (undecoded) ondisk+retry+read+known_if_redirected+full_force e186495) currently waiting for peered
2018-02-11 15:46:44.558898 7f2f3f4c8700 0 log_channel(cluster) log [WRN] : slow request 120.882175 seconds old, received at 2018-02-11 15:44:43.676547: osd_op(client.91747622.1:17154 36.ffa0bf6 36:6fd05ff0:::rbd_data.579a042ae8944a.0000000000000420:head [read 0~4096] snapc 0=[] RETRY=6 ack+retry+read+known_if_redirected e186495) currently waiting for peered
2018-02-11 15:48:44.584033 7f2f3f4c8700 0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for > 240.909709 secs
2018-02-11 15:48:44.584079 7f2f3f4c8700 0 log_channel(cluster) log [WRN] : slow request 240.909709 seconds old, received at 2018-02-11 15:44:43.674240: osd_op(mds.0.3616:2 9.20b 9.3270c60b (undecoded) ondisk+retry+read+known_if_redirected+full_force e186495) currently waiting for peered
2018-02-11 15:48:44.584090 7f2f3f4c8700 0 log_channel(cluster) log [WRN] : slow request 240.907402 seconds old, received at 2018-02-11 15:44:43.676547: osd_op(client.91747622.1:17154 36.ffa0bf6 36:6fd05ff0:::rbd_data.579a042ae8944a.0000000000000420:head [read 0~4096] snapc 0=[] RETRY=6 ack+retry+read+known_if_redirected e186495) currently waiting for peered
I just can't understand why those guys are not activating the pgs, and "currently waiting for peered"....
Are there any chances for the cluster to heal itself and peer, activate pgs one by one overnight ( assuming no other bad events) ?