CEPH problem

gosha

Well-Known Member
Oct 20, 2014
302
26
58
Russia
Hi!

My CEPH-storage gives an error:

# ceph health
HEALTH_ERR 27 pgs are stuck inactive for more than 300 seconds; 7 pgs down; 27 pgs incomplete; 27 pgs stuck inactive; 27 pgs stuck unclean; 1 requests are blocked > 32 sec

root@acn2:~# ceph health detail
HEALTH_ERR 27 pgs are stuck inactive for more than 300 seconds; 7 pgs down; 27 pgs incomplete; 27 pgs stuck inactive; 27 pgs stuck unclean; 1 requests are blocked > 32 sec; 1 osds have slow requests
pg 1.c7 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.c1 is stuck inactive since forever, current state incomplete, last acting [3,0]
pg 1.c0 is stuck inactive since forever, current state incomplete, last acting [3,0]
pg 1.af is stuck inactive since forever, current state incomplete, last acting [3,0]
pg 1.dd is stuck inactive since forever, current state incomplete, last acting [3,0]
pg 1.b7 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.e4 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.64 is stuck inactive since forever, current state incomplete, last acting [3,0]
pg 1.5b is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.9e is stuck inactive since forever, current state down+incomplete, last acting [3,0]
pg 1.f7 is stuck inactive since forever, current state down+incomplete, last acting [3,0]
pg 1.3f is stuck inactive since forever, current state down+incomplete, last acting [3,0]
pg 1.e8 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.48 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.1b is stuck inactive since forever, current state incomplete, last acting [3,0]
pg 1.f is stuck inactive since forever, current state down+incomplete, last acting [3,0]
pg 1.24 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.d4 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.19 is stuck inactive since forever, current state down+incomplete, last acting [3,0]
pg 1.a3 is stuck inactive since forever, current state down+incomplete, last acting [3,0]
pg 1.7c is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.84 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.87 is stuck inactive since forever, current state down+incomplete, last acting [3,0]
pg 1.c9 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.6b is stuck inactive since forever, current state incomplete, last acting [3,0]
pg 1.98 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.a0 is stuck inactive since forever, current state incomplete, last acting [0,3]
pg 1.c7 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.c1 is stuck unclean since forever, current state incomplete, last acting [3,0]
pg 1.c0 is stuck unclean since forever, current state incomplete, last acting [3,0]
pg 1.af is stuck unclean since forever, current state incomplete, last acting [3,0]
pg 1.a3 is stuck unclean since forever, current state down+incomplete, last acting [3,0]
pg 1.a0 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.9e is stuck unclean since forever, current state down+incomplete, last acting [3,0]
pg 1.87 is stuck unclean since forever, current state down+incomplete, last acting [3,0]
pg 1.84 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.7c is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.24 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.19 is stuck unclean since forever, current state down+incomplete, last acting [3,0]
pg 1.3f is stuck unclean since forever, current state down+incomplete, last acting [3,0]
pg 1.1b is stuck unclean since forever, current state incomplete, last acting [3,0]
pg 1.48 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.64 is stuck unclean since forever, current state incomplete, last acting [3,0]
pg 1.5b is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.98 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.6b is stuck unclean since forever, current state incomplete, last acting [3,0]
pg 1.c9 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.d4 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.dd is stuck unclean since forever, current state incomplete, last acting [3,0]
pg 1.b7 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.e4 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.e8 is stuck unclean since forever, current state incomplete, last acting [0,3]
pg 1.f is stuck unclean since forever, current state down+incomplete, last acting [3,0]
pg 1.f7 is stuck unclean since forever, current state down+incomplete, last acting [3,0]
pg 1.f7 is down+incomplete, acting [3,0]
pg 1.f is down+incomplete, acting [3,0]
pg 1.e8 is incomplete, acting [0,3]
pg 1.e4 is incomplete, acting [0,3]
pg 1.dd is incomplete, acting [3,0]
pg 1.d4 is incomplete, acting [0,3]
pg 1.c9 is incomplete, acting [0,3]
pg 1.5b is incomplete, acting [0,3]
pg 1.64 is incomplete, acting [3,0]
pg 1.48 is incomplete, acting [0,3]
pg 1.3f is down+incomplete, acting [3,0]
pg 1.1b is incomplete, acting [3,0]
pg 1.19 is down+incomplete, acting [3,0]
pg 1.24 is incomplete, acting [0,3]
pg 1.6b is incomplete, acting [3,0]
pg 1.7c is incomplete, acting [0,3]
pg 1.84 is incomplete, acting [0,3]
pg 1.87 is down+incomplete, acting [3,0]
pg 1.98 is incomplete, acting [0,3]
pg 1.9e is down+incomplete, acting [3,0]
pg 1.a0 is incomplete, acting [0,3]
pg 1.a3 is down+incomplete, acting [3,0]
pg 1.af is incomplete, acting [3,0]
pg 1.b7 is incomplete, acting [0,3]
pg 1.c0 is incomplete, acting [3,0]
pg 1.c1 is incomplete, acting [3,0]
pg 1.c7 is incomplete, acting [0,3]
1 ops are blocked > 16777.2 sec on osd.3
1 osds have slow requests

# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.41745 root default
-2 2.70428 host acn1
1 0.89999 osd.1 up 1.00000 1.00000
2 0.89999 osd.2 up 1.00000 1.00000
0 0.90430 osd.0 up 1.00000 1.00000
-3 2.71317 host acn2
3 0.90439 osd.3 up 1.00000 1.00000
4 0.90439 osd.4 up 1.00000 1.00000
5 0.90439 osd.5 up 1.00000 1.00000

Storage has three monitors.
mon.0 and mon.1 both with three OSD
mon.2 - for quorum only (without OSD).

# ceph version
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

How can I fix this error?

Best regards,
Gosha
 
you run just two OSD hosts?

you should run at least 3 nodes and you should use a replication of 3. If you do not follow this, problems are expected.
 
you run just two OSD hosts?
you should run at least 3 nodes and you should use a replication of 3. If you do not follow this, problems are expected.

Yes - two OSD hosts and replication of 2.
Does this mean that the problem is not solvable for my configuration?

Best regards,
Gosha
 
Yes - two OSD hosts and replication of 2.
Does this mean that the problem is not solvable for my configuration?

Best regards,
Gosha

It means that your date/config could break in some cases. It should be possible to recover, check the Ceph troubleshooting docs (ceph.com).

After that, make sure you have 3 nodes and a replication of 3.