Data unavailable after power failure on a 3 node cluster

eiacub

Renowned Member
Apr 10, 2015
4
0
66
Hi all,

I have a three node cluster with seph setup. everything was all for ~6 months until last night, after a power failure.
Now the vms start but work until local storage fills up (/var/log/ceph/ceph.log and /var/lib/ceph/mon/ceph-node/store.db)

ceph command does not give any response besides: '0 monclient(hunting): authenticate timed out after 300'

and that repeats on every node...

Unfortunately norecent backups...
 
I have very little ceph experience, but I'll try.

monclient makes me believe thats a problem with the ceph mons.

are you sure you have quorum?
 
  • is network ok (ip connection, jumbo frames ok if used) in ceph public and ceph cluster network (if separated)?
  • is the time in sync? ceph has lots of problems if time is not in sync
 
I am not sure thow how do I mount the OCDs locally...

Setup​


  1. Clone this repo to a place of your choice. Make sure that you have at least a few hundrets MB space in this directory.
  2. Create a new subfolder osds in this folder.
  3. Choose one of the following options or mix them: 3.1. Attach all OSDs as local storage. For every OSD create a subfolder in the osds folder and mount it there. 3.2 Use sshfs to mount OSDs over network / ssh
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!