Yesterday we suffered a kernel panic on one of our hosts. This locked up the entire cluster until we discovered which host it was and restarted the host. After that event we have had issues on our ceph cluster. Right now we have a pg that is stuck unclean. I've pasted the ceph status below. Any help to recover the ceph cluster and get the VMs to start responding would be appreciated.
root@host11:~# ceph status
cluster:
id: fbd9dc8d-6898-4159-89a8-00448f2efd0b
health: HEALTH_ERR
Reduced data availability: 1 pg inactive, 1 pg incomplete
Degraded data redundancy: 1 pg unclean
59 stuck requests are blocked > 4096 sec
services:
mon: 3 daemons, quorum host12,host14,host15
mgr: host12(active), standbys: host14, host15
osd: 261 osds: 250 up, 250 in
rgw: 1 daemon active
data:
pools: 13 pools, 13392 pgs
objects: 7989k objects, 25804 GB
usage: 78048 GB used, 190 TB / 266 TB avail
pgs: 0.007% pgs not active
13391 active+clean
1 incomplete
io:
client: 2441 kB/s rd, 985 kB/s wr, 275 op/s rd, 113 op/s wr
root@host11:~# ceph status
cluster:
id: fbd9dc8d-6898-4159-89a8-00448f2efd0b
health: HEALTH_ERR
Reduced data availability: 1 pg inactive, 1 pg incomplete
Degraded data redundancy: 1 pg unclean
59 stuck requests are blocked > 4096 sec
services:
mon: 3 daemons, quorum host12,host14,host15
mgr: host12(active), standbys: host14, host15
osd: 261 osds: 250 up, 250 in
rgw: 1 daemon active
data:
pools: 13 pools, 13392 pgs
objects: 7989k objects, 25804 GB
usage: 78048 GB used, 190 TB / 266 TB avail
pgs: 0.007% pgs not active
13391 active+clean
1 incomplete
io:
client: 2441 kB/s rd, 985 kB/s wr, 275 op/s rd, 113 op/s wr