Cephfs - MDS all up:standby, not becoming up:active

jw6677

Active Member
Oct 19, 2019
93
5
28
33
www.cayk.ca
Like a dummy I accidentally upgraded to the ceph dev branch (quincy?), and have been having nothing but trouble since.

This wasn't actually intentionally, I was trying to implement a PR which was expected to bring my cluster back online after the upgrade to v& (and ceph pacific).

--> It did bring my cluster back online, so that's good, but I failed to recognize that by building from the master branch that I also wouldn't be able to revert later. Whoops.

Lastly, while my MDS are online (MONs, OSDs and MGRs too.) MDSs are never marked up:active, so my cephfs data is inaccessible.

Hoping someone can help me determine the best way to bring an MDS up:active, as I have the bulk of my proxmox backups in cephfs along with a handful of VM Disks.


mds log immediately after restarting mds:
Code:
 2021-08-08T09:51:09.417-0600 7fb015e05700  1 mds.server Updating MDS map to version 1095533 from mon.1
 2021-08-08T09:51:09.417-0600 7fb013e01700  5 mds.beacon.server Sending beacon up:boot seq 1
 2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
 2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server  mdsmap compat compat={},rocompat={},incompat={}
 2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server my gid is 138479365
 2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server map says I am mds.-1.-1 state null
 2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server msgr says I am [v2:192.168.2.2:6808/1262779536,v1:192.168.2.2:6809/1262779536]
 2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server handle_mds_map: handling map in rankless mode
 2021-08-08T09:51:09.441-0600 7fb013e01700 20 mds.beacon.server sender thread waiting interval 4s
 2021-08-08T09:51:09.441-0600 7fb015e05700 10 mds.server not in map yet
 2021-08-08T09:51:09.765-0600 7fb015e05700  1 mds.server Updating MDS map to version 1095534 from mon.1
 2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
 2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server  mdsmap compat compat={},rocompat={},incompat={}
 2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server my gid is 138479365
 2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server map says I am mds.-1.0 state up:standby
 2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server msgr says I am [v2:192.168.2.2:6808/1262779536,v1:192.168.2.2:6809/1262779536]
 2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server handle_mds_map: handling map in rankless mode
 2021-08-08T09:51:09.765-0600 7fb015e05700  1 mds.server Monitors have assigned me to become a standby.
 2021-08-08T09:51:09.765-0600 7fb015e05700  5 mds.beacon.server set_want_state: up:boot -> up:standby
 2021-08-08T09:51:09.777-0600 7fb018e0b700  5 mds.beacon.server received beacon reply up:boot seq 1 rtt 0.360009
 2021-08-08T09:51:13.442-0600 7fb013e01700  5 mds.beacon.server Sending beacon up:standby seq 2
 2021-08-08T09:51:13.442-0600 7fb013e01700 20 mds.beacon.server sender thread waiting interval 4s
 2021-08-08T09:51:13.442-0600 7fb018e0b700  5 mds.beacon.server received beacon reply up:standby seq 2 rtt 0
 2021-08-08T09:51:17.442-0600 7fb013e01700  5 mds.beacon.server Sending beacon up:standby seq 3
 2021-08-08T09:51:17.442-0600 7fb013e01700 20 mds.beacon.server sender thread waiting interval 4s
cycles on this forever, never marked up.


This in particular looks weird to me from a log file, as 192.168.2.20 is a different node:
[v2:192.168.2.6:3300/0,v1:192.168.2.6:6789/0] >> conn(0x561e0593f800 0x561e07c57000 :6789 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is really - (socket is v1:192.168.2.20:60454/0)



As far as I can tell, no cephfs or mds settings seem to help. Number of ranks, standy-active or not, cephx or not, different networks, recreating mgrs mons or mds, etc.



I did however notice this and am hoping someone can confirm if normal, or if I am about to go on a goose chase.

Out of this block of the MDS log:
Code:
2021-08-10T09:31:09.484-0600 7ffa894fc700  1 mds.rog Updating MDS map to version 1095550 from mon.2

2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}

2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog  mdsmap compat compat={},rocompat={},incompat={}

2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog my gid is 139597028 2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog map says I am mds.-1.-1 state null

2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog msgr says I am [v2:192.168.10.50:6800/1353942242,v1:192.168.10.50:6801/1353942242]

2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog handle_mds_map: handling map in rankless mode

2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog not in map yet

2021-08-10T09:31:10.000-0600 7ffa894fc700  1 mds.rog Updating MDS map to version 1095551 from mon.2

2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}

2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog  mdsmap compat compat={},rocompat={},incompat={}

2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog my gid is 139597028

2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog map says I am mds.-1.0 state up:standby

2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog msgr says I am [v2:192.168.10.50:6800/1353942242,v1:192.168.10.50:6801/1353942242]

2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog handle_mds_map: handling map in rankless mode

2021-08-10T09:31:10.000-0600 7ffa894fc700  1 mds.rog Monitors have assigned me to become a standby.

2021-08-10T09:31:10.000-0600 7ffa894fc700  5 mds.beacon.rog set_want_state: up:boot -> up:standby



These Lines:
Code:
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog      my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog  mdsmap compat compat={},rocompat={},incompat={}
---> Looks like a difference between the mdsmap and MDS's "incompat={" setting. Is one meant to be `incompat={}` ?

This Line:
Code:
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog map says I am mds.-1.-1 state null
---> Is mds.-1.-1 normal?

This Line:
Code:
2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog handle_mds_map: handling map in rankless mode
---> Is "handling map in rankless mode" normal?




I am hopeful to recover my cephfs data by whatever means makes the most sense.

Eventually I intend to create a seperate temporary ceph cluster, and migrate my data back to Pacific but really don't want to abandon this data if I can avoid it.



Help!


~Josh
 
Last edited:
It's a sad story unfortunately, I mucked about trying to get it working for months before finally giving up and accepting the data loss. :(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!