Like a dummy I accidentally upgraded to the ceph dev branch (quincy?), and have been having nothing but trouble since.
This wasn't actually intentionally, I was trying to implement a PR which was expected to bring my cluster back online after the upgrade to v& (and ceph pacific).
--> It did bring my cluster back online, so that's good, but I failed to recognize that by building from the master branch that I also wouldn't be able to revert later. Whoops.
Lastly, while my MDS are online (MONs, OSDs and MGRs too.) MDSs are never marked up:active, so my cephfs data is inaccessible.
Hoping someone can help me determine the best way to bring an MDS up:active, as I have the bulk of my proxmox backups in cephfs along with a handful of VM Disks.
mds log immediately after restarting mds:
cycles on this forever, never marked up.
This in particular looks weird to me from a log file, as 192.168.2.20 is a different node:
[v2:192.168.2.6:3300/0,v1:192.168.2.6:6789/0] >> conn(0x561e0593f800 0x561e07c57000 :6789 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is really - (socket is v1:192.168.2.20:60454/0)
As far as I can tell, no cephfs or mds settings seem to help. Number of ranks, standy-active or not, cephx or not, different networks, recreating mgrs mons or mds, etc.
I did however notice this and am hoping someone can confirm if normal, or if I am about to go on a goose chase.
Out of this block of the MDS log:
These Lines:
---> Looks like a difference between the mdsmap and MDS's "incompat={" setting. Is one meant to be `incompat={}` ?
This Line:
---> Is mds.-1.-1 normal?
This Line:
---> Is "handling map in rankless mode" normal?
I am hopeful to recover my cephfs data by whatever means makes the most sense.
Eventually I intend to create a seperate temporary ceph cluster, and migrate my data back to Pacific but really don't want to abandon this data if I can avoid it.
Help!
~Josh
This wasn't actually intentionally, I was trying to implement a PR which was expected to bring my cluster back online after the upgrade to v& (and ceph pacific).
--> It did bring my cluster back online, so that's good, but I failed to recognize that by building from the master branch that I also wouldn't be able to revert later. Whoops.
Lastly, while my MDS are online (MONs, OSDs and MGRs too.) MDSs are never marked up:active, so my cephfs data is inaccessible.
Hoping someone can help me determine the best way to bring an MDS up:active, as I have the bulk of my proxmox backups in cephfs along with a handful of VM Disks.
mds log immediately after restarting mds:
Code:
2021-08-08T09:51:09.417-0600 7fb015e05700 1 mds.server Updating MDS map to version 1095533 from mon.1
2021-08-08T09:51:09.417-0600 7fb013e01700 5 mds.beacon.server Sending beacon up:boot seq 1
2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server mdsmap compat compat={},rocompat={},incompat={}
2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server my gid is 138479365
2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server map says I am mds.-1.-1 state null
2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server msgr says I am [v2:192.168.2.2:6808/1262779536,v1:192.168.2.2:6809/1262779536]
2021-08-08T09:51:09.417-0600 7fb015e05700 10 mds.server handle_mds_map: handling map in rankless mode
2021-08-08T09:51:09.441-0600 7fb013e01700 20 mds.beacon.server sender thread waiting interval 4s
2021-08-08T09:51:09.441-0600 7fb015e05700 10 mds.server not in map yet
2021-08-08T09:51:09.765-0600 7fb015e05700 1 mds.server Updating MDS map to version 1095534 from mon.1
2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server mdsmap compat compat={},rocompat={},incompat={}
2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server my gid is 138479365
2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server map says I am mds.-1.0 state up:standby
2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server msgr says I am [v2:192.168.2.2:6808/1262779536,v1:192.168.2.2:6809/1262779536]
2021-08-08T09:51:09.765-0600 7fb015e05700 10 mds.server handle_mds_map: handling map in rankless mode
2021-08-08T09:51:09.765-0600 7fb015e05700 1 mds.server Monitors have assigned me to become a standby.
2021-08-08T09:51:09.765-0600 7fb015e05700 5 mds.beacon.server set_want_state: up:boot -> up:standby
2021-08-08T09:51:09.777-0600 7fb018e0b700 5 mds.beacon.server received beacon reply up:boot seq 1 rtt 0.360009
2021-08-08T09:51:13.442-0600 7fb013e01700 5 mds.beacon.server Sending beacon up:standby seq 2
2021-08-08T09:51:13.442-0600 7fb013e01700 20 mds.beacon.server sender thread waiting interval 4s
2021-08-08T09:51:13.442-0600 7fb018e0b700 5 mds.beacon.server received beacon reply up:standby seq 2 rtt 0
2021-08-08T09:51:17.442-0600 7fb013e01700 5 mds.beacon.server Sending beacon up:standby seq 3
2021-08-08T09:51:17.442-0600 7fb013e01700 20 mds.beacon.server sender thread waiting interval 4s
This in particular looks weird to me from a log file, as 192.168.2.20 is a different node:
[v2:192.168.2.6:3300/0,v1:192.168.2.6:6789/0] >> conn(0x561e0593f800 0x561e07c57000 :6789 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is really - (socket is v1:192.168.2.20:60454/0)
As far as I can tell, no cephfs or mds settings seem to help. Number of ranks, standy-active or not, cephx or not, different networks, recreating mgrs mons or mds, etc.
I did however notice this and am hoping someone can confirm if normal, or if I am about to go on a goose chase.
Out of this block of the MDS log:
Code:
2021-08-10T09:31:09.484-0600 7ffa894fc700 1 mds.rog Updating MDS map to version 1095550 from mon.2
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog mdsmap compat compat={},rocompat={},incompat={}
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog my gid is 139597028 2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog map says I am mds.-1.-1 state null
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog msgr says I am [v2:192.168.10.50:6800/1353942242,v1:192.168.10.50:6801/1353942242]
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog handle_mds_map: handling map in rankless mode
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog not in map yet
2021-08-10T09:31:10.000-0600 7ffa894fc700 1 mds.rog Updating MDS map to version 1095551 from mon.2
2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog mdsmap compat compat={},rocompat={},incompat={}
2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog my gid is 139597028
2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog map says I am mds.-1.0 state up:standby
2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog msgr says I am [v2:192.168.10.50:6800/1353942242,v1:192.168.10.50:6801/1353942242]
2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog handle_mds_map: handling map in rankless mode
2021-08-10T09:31:10.000-0600 7ffa894fc700 1 mds.rog Monitors have assigned me to become a standby.
2021-08-10T09:31:10.000-0600 7ffa894fc700 5 mds.beacon.rog set_want_state: up:boot -> up:standby
These Lines:
Code:
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog mdsmap compat compat={},rocompat={},incompat={}
This Line:
Code:
2021-08-10T09:31:09.484-0600 7ffa894fc700 10 mds.rog map says I am mds.-1.-1 state null
This Line:
Code:
2021-08-10T09:31:10.000-0600 7ffa894fc700 10 mds.rog handle_mds_map: handling map in rankless mode
I am hopeful to recover my cephfs data by whatever means makes the most sense.
Eventually I intend to create a seperate temporary ceph cluster, and migrate my data back to Pacific but really don't want to abandon this data if I can avoid it.
Help!
~Josh
Last edited: