Hello,
I've run in to the following on our 3 node ceph cluster
I'm checking docs at http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ , but to jump start the solution [ if any ] suggestions are appreciated.
thanks
Rob
I've run in to the following on our 3 node ceph cluster
Code:
# ceph health detail
HEALTH_WARN 32 pgs degraded; 92 pgs down; 92 pgs peering; 92 pgs stuck inactive; 192 pgs stuck unclean; 3 requests are blocked > 32 sec; 2 osds have slow requests; recovery 46790/456882 objects degraded (10.241%); 1 mons down, quorum 0,1,2 0,2,1
pg 1.20 is stuck inactive for 74762.284833, current state down+peering, last acting [10,6]
pg 1.21 is stuck inactive for 65922.715915, current state down+peering, last acting [10,7]
pg 1.1e is stuck inactive for 65690.198886, current state down+peering, last acting [4,8]
pg 2.1c is stuck inactive for 65810.745600, current state down+peering, last acting [10,7]
pg 0.1e is stuck inactive for 65690.198691, current state down+peering, last acting [7,8]
pg 1.1f is stuck inactive for 65906.255850, current state down+peering, last acting [8,4]
pg 0.1d is stuck inactive for 65690.210385, current state down+peering, last acting [5,10]
pg 1.1d is stuck inactive for 65690.210323, current state down+peering, last acting [5,10]
pg 1.1a is stuck inactive for 65690.198144, current state down+peering, last acting [4,10]
pg 1.1b is stuck inactive for 65906.287501, current state down+peering, last acting [8,4]
pg 2.1b is stuck inactive since forever, current state down+peering, last acting [5,9]
pg 1.18 is stuck inactive for 65922.709773, current state down+peering, last acting [10,4]
pg 2.1a is stuck inactive since forever, current state down+peering, last acting [10,4]
pg 0.17 is stuck inactive since forever, current state down+peering, last acting [6,10]
pg 1.16 is stuck inactive for 65690.192023, current state down+peering, last acting [6,8]
pg 2.14 is stuck inactive for 65594.732013, current state down+peering, last acting [6,9]
pg 2.17 is stuck inactive for 65813.617834, current state down+peering, last acting [10,5]
pg 2.16 is stuck inactive for 65234.560829, current state down+peering, last acting [5,10]
pg 0.14 is stuck inactive since forever, current state down+peering, last acting [9,4]
pg 1.15 is stuck inactive for 65906.288367, current state down+peering, last acting [8,5]
pg 2.11 is stuck inactive for 65819.547115, current state down+peering, last acting [6,8]
pg 0.13 is stuck inactive for 65690.190726, current state down+peering, last acting [7,10]
pg 1.12 is stuck inactive since forever, current state down+peering, last acting [5,10]
...
pg 2.2f is down+peering, acting [7,9]
pg 2.2e is down+peering, acting [7,9]
pg 0.2c is down+peering, acting [8,7]
pg 2.29 is down+peering, acting [6,10]
pg 1.2a is down+peering, acting [5,10]
pg 2.28 is down+peering, acting [5,9]
pg 2.2b is down+peering, acting [7,8]
pg 2.2a is down+peering, acting [9,5]
pg 0.28 is down+peering, acting [7,8]
...
pg 0.2f is down+peering, acting [8,5]
pg 2.2c is down+peering, acting [8,6]
pg 1.2f is down+peering, acting [5,8]
1 ops are blocked > 4194.3 sec
2 ops are blocked > 2097.15 sec
1 ops are blocked > 4194.3 sec on osd.8
2 ops are blocked > 2097.15 sec on osd.10
2 osds have slow requests
recovery 46790/456882 objects degraded (10.241%)
mon.3 (rank 3) addr 10.11.12.240:6789/0 is down (out of quorum)
Code:
# ceph health
HEALTH_WARN
32 pgs degraded;
92 pgs down; 92 pgs peering;
92 pgs stuck inactive;
192 pgs stuck unclean;
3 requests are blocked > 32 sec;
recovery 46790/456882 objects degraded (10.241%);
1 mons down, quorum 0,1,2 0,2,1
I'm checking docs at http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ , but to jump start the solution [ if any ] suggestions are appreciated.
thanks
Rob