Data unavailable after power failure on a 3 node cluster

eiacub

Renowned Member
Apr 10, 2015
4
0
66
Hi all,

I have a three node cluster with seph setup. everything was all for ~6 months until last night, after a power failure.
Now the vms start but work until local storage fills up (/var/log/ceph/ceph.log and /var/lib/ceph/mon/ceph-node/store.db)

ceph command does not give any response besides: '0 monclient(hunting): authenticate timed out after 300'

and that repeats on every node...

Unfortunately norecent backups...
 
I have very little ceph experience, but I'll try.

monclient makes me believe thats a problem with the ceph mons.

are you sure you have quorum?
 
  • is network ok (ip connection, jumbo frames ok if used) in ceph public and ceph cluster network (if separated)?
  • is the time in sync? ceph has lots of problems if time is not in sync
 
I am not sure thow how do I mount the OCDs locally...

Setup​


  1. Clone this repo to a place of your choice. Make sure that you have at least a few hundrets MB space in this directory.
  2. Create a new subfolder osds in this folder.
  3. Choose one of the following options or mix them: 3.1. Attach all OSDs as local storage. For every OSD create a subfolder in the osds folder and mount it there. 3.2 Use sshfs to mount OSDs over network / ssh