ceph MDS: corrupt sessionmap values: Corrupt entity name in sessionmap: Malformed input

skydiablo

Member
Dec 10, 2020
20
2
23
43
hi!

my cephfs is broken and i can not recover the mds-daemons. yesterday i have update pve v6 to v7 and my ceph-cluster from v15 to v16 and i thought all working fine. next day (today) some of my services goes down and throw errors, so i dig into and find my cephfs is down and cannot restart.

my current status is:

# ceph status
cluster:
id: acd880fe-5f42-4930-8071-c4894c9b678e
health: HEALTH_ERR
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
11 scrub errors
Possible data damage: 3 pgs inconsistent
2 daemons have recently crashed

services:
mon: 3 daemons, quorum pve04,pve05,pve06 (age 103m)
mgr: pve04(active, since 107m), standbys: pve05, pve06
mds: 0/1 daemons up, 3 standby
osd: 30 osds: 30 up (since 103m), 30 in (since 8M)
rgw: 3 daemons active (3 hosts, 1 zones)

data:
volumes: 0/1 healthy, 1 recovering; 1 damaged
pools: 12 pools, 800 pgs
objects: 483.49k objects, 1.8 TiB
usage: 5.3 TiB used, 104 TiB / 109 TiB avail
pgs: 797 active+clean
3 active+clean+inconsistent+failed_repair

io:
client: 255 B/s rd, 229 KiB/s wr, 0 op/s rd, 17 op/s wr

i know, there are also 3 inconsistent pgs, but this is another story. my next try was to repaired the mds:

# ceph mds repaired 0
repaired: restoring rank 1:0

the log output call something about "corrupt values", checkout:

Sep 30 10:26:51 pve06 ceph-mds[42343]: 2021-09-30T10:26:51.913+0200 7efdec78c700 -1 mds.0.sessionmap Corrupt entity name '
Sep 30 10:26:51 pve06 ceph-mds[42343]: [1B blob data]
Sep 30 10:26:51 pve06 ceph-mds[42343]: w-' in sessionmap
Sep 30 10:26:51 pve06 ceph-mds[42343]: 2021-09-30T10:26:51.913+0200 7efdec78c700 -1 log_channel(cluster) log [ERR] : corrupt sessionmap values: Corrupt entity name in sessionmap: Malformed input
Sep 30 10:26:51 pve06 ceph-mds[42343]: -13> 2021-09-30T10:26:51.913+0200 7efdec78c700 -1 mds.0.sessionmap Corrupt entity name '
Sep 30 10:26:51 pve06 ceph-mds[42343]: [1B blob data]
Sep 30 10:26:51 pve06 ceph-mds[42343]: w-' in sessionmap
Sep 30 10:26:51 pve06 ceph-mds[42343]: -12> 2021-09-30T10:26:51.913+0200 7efdec78c700 -1 log_channel(cluster) log [ERR] : corrupt sessionmap values: Corrupt entity name in sessionmap: Malformed input
Sep 30 10:26:51 pve06 ceph-mds[42343]: did not load config file, using default settings.
Sep 30 10:26:51 pve06 ceph-mds[42343]: ignoring --setuser ceph since I am not root
Sep 30 10:26:51 pve06 ceph-mds[42343]: ignoring --setgroup ceph since I am not root
Sep 30 10:26:51 pve06 ceph-mds[42343]: 2021-09-30T10:26:51.945+0200 7fd8ef607600 -1 Errors while parsing config file!
Sep 30 10:26:51 pve06 ceph-mds[42343]: 2021-09-30T10:26:51.945+0200 7fd8ef607600 -1 can't open ceph.conf: (2) No such file or directory
Sep 30 10:26:51 pve06 ceph-mds[42343]: unable to get monitor info from DNS SRV with service name: ceph-mon
Sep 30 10:26:51 pve06 ceph-mds[42343]: 2021-09-30T10:26:51.969+0200 7fd8ef607600 -1 failed for service _ceph-mon._tcp
Sep 30 10:26:51 pve06 ceph-mds[42343]: 2021-09-30T10:26:51.969+0200 7fd8ef607600 -1 monclient: get_monmap_and_config cannot identify monitors to contact
Sep 30 10:26:51 pve06 ceph-mds[42343]: failed to fetch mon config (--no-mon-config to skip)
Sep 30 10:26:51 pve06 systemd[1]: ceph-mds@pve06.service: Main process exited, code=exited, status=1/FAILURE
Sep 30 10:26:51 pve06 systemd[1]: ceph-mds@pve06.service: Failed with result 'exit-code'.
Sep 30 10:26:52 pve06 systemd[1]: ceph-mds@pve06.service: Scheduled restart job, restart counter is at 7.
Sep 30 10:26:52 pve06 systemd[1]: Stopped Ceph metadata server daemon.
Sep 30 10:26:52 pve06 systemd[1]: Started Ceph metadata server daemon.
Sep 30 10:26:52 pve06 ceph-mds[47109]: starting mds.pve06 at

so i do not know which file is corrupted? ceph.conf?

the given errors "corrupt sessionmap values: Corrupt entity name in sessionmap" are thrown by this code:

https://github.com/ceph/ceph/blob/master/src/mds/SessionMap.cc

and there is also no "sessionmap" file on hard-drive: # find / -name '*.sessionmap' -> no results!

any sugestions to what can i do now?

regards, volker.
 
for now, i have tried this:

Code:
# systemctl stop ceph-mds@pve04.service
# cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
# cephfs-journal-tool --rank=cephfs:0 journal reset
# cephfs-table-tool all reset session
# systemctl start ceph-mds@pve04.service
# ceph mds repaired 0

and now there is this log output: https://pastebin.com/DBRq8iwM

not the same but similar errors... i'm a little bit confused about the definition of `ceph::buffer::v15_2_0::list`, so i'm running ceph v16...