Ceph 16.2.6 - CEPHFS failed after upgrade from 16.2.5

dlasher

Renowned Member
Mar 23, 2011
251
40
93
TL;DR - Upgrade from 16.2.5 to 16.2.6 - CEPHFS fails to start after upgrade, all MDS in "standby" - requires

Code:
ceph fs compat <fs name> add_incompat 7 "mds uses inline data"

to work again.


Longer version :

pve-manager/7.0-11/63d82f4e (running kernel: 5.11.22-5-pve)

apt dist-upgraded, CEPH was upgraded from 16.2.5 to 16.2.6, which then made the GUI complain about differing versions. Following my historical mathod, which lines up with online advice:
* restart MGR
* restart MDS
* restart OSD

Followed that advice, MDS's all went standby, wouldn't start. Restarted everything, including the entire cluster(s) themselves, rank 0/1 both showed "failed" and all MDS's in standby, but nothing alive.

tried a lot of steps with "ceph-journal-tool" (good education) - nothing brought them back online. lots of google directions around "damaged" MDS, none of that applied or worked.

finally stumbled across this thread : https://www.mail-archive.com/ceph-users@ceph.io/msg12463.html

* stop all MDS
* ceph fs set cephfs max_mds 1
* ceph fs set cephfs allow_standby_replay false
* ceph fs compat <fs name> add_incompat 7 "mds uses inline data"
* start MDS
* magic ensues!


Posting here for other PMX users, this is a nasty stone to trip over, cost me easily 20 hours this weekend, and mild amounts of panic.


(more details here : https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KQ5A5OWRIUEOJBC7VILBGDIKPQGJQIWN/)
 
Last edited: