Hi,
after rebooting 1 node serving MDS I get this error message in this node's syslog:
root@ld3955:~# tail /var/log/syslog
Sep 17 12:21:18 ld3955 kernel: [ 3141.167834] ceph: probably no mds server is up
Sep 17 12:21:18 ld3955 pvestatd[2482]: mount error: exit code 2
Sep 17 12:21:28 ld3955 kernel: [ 3151.319780] libceph: mon2 10.97.206.95:6789 session established
Sep 17 12:21:28 ld3955 kernel: [ 3151.327118] libceph: client38594183 fsid 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
Sep 17 12:21:28 ld3955 kernel: [ 3151.327163] ceph: probably no mds server is up
Sep 17 12:21:28 ld3955 pvestatd[2482]: mount error: exit code 2
Sep 17 12:21:38 ld3955 kernel: [ 3161.537316] libceph: mon0 10.97.206.93:6789 session established
Sep 17 12:21:38 ld3955 pvestatd[2482]: mount error: exit code 2
Sep 17 12:21:38 ld3955 kernel: [ 3161.543618] libceph: client38684721 fsid 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
Sep 17 12:21:38 ld3955 kernel: [ 3161.544383] ceph: probably no mds server is up
There's no error in ceph-mds.ld3955.log:
root@ld3955:~# tail -f /var/log/ceph/ceph-mds.ld3955.log
2019-09-17 12:08:14.670 7f9610af4700 0 ms_deliver_dispatch: unhandled message 0x563ee6090500 osd_map(183147..183147 src has 172245..183147) v4
from mon.0 v2:10.97.206.93:3300/0
2019-09-17 12:08:14.670 7f9610af4700 0 ms_deliver_dispatch: unhandled message 0x563ee2267440 mdsmap(e 66927) v1 from mon.0 v2:10.97.206.93:3300/0
2019-09-17 12:08:14.670 7f9610af4700 0 ms_deliver_dispatch: unhandled message 0x563ee2267200 mdsmap(e 66928) v1 from mon.0 v2:10.97.206.93:3300/0
2019-09-17 12:08:14.670 7f96092e5700 0 mds.0.log _replay journaler got error -11, aborting
2019-09-17 12:11:48.279 7fb75f9ca340 0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-17 12:11:48.279 7fb75f9ca340 0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mds, pid
45678
2019-09-17 12:11:48.279 7fb75f9ca340 0 pidfile_write: ignore empty --pid-file
2019-09-17 12:11:48.283 7fb75bee3700 1 mds.ld3955 Updating MDS map to version 66928 from mon.2
2019-09-17 12:11:49.231 7fb75bee3700 1 mds.ld3955 Updating MDS map to version 66929 from mon.2
2019-09-17 12:11:49.231 7fb75bee3700 1 mds.ld3955 Map has assigned me to become a standby
The other node is now replay, and there is this error in ceph-mds.ld3976.log:
root@ld3976:~# tail -f /var/log/ceph/ceph-mds.ld3976.log
2019-09-17 12:33:28.189 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10c7a80 0x5589f10cf000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.193 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10d2480 0x5589ed271800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.197 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10c7a80 0x5589f10cf000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.197 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10d2480 0x5589ed271800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.201 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10c7a80 0x5589f10cf000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.205 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10d2480 0x5589ed271800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.209 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10c7a80 0x5589f10cf000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.213 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10d2480 0x5589ed271800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.213 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10c7a80 0x5589f10cf000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.221 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10d2480 0x5589ed271800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
What is causing this error?
How can I fix it?
after rebooting 1 node serving MDS I get this error message in this node's syslog:
root@ld3955:~# tail /var/log/syslog
Sep 17 12:21:18 ld3955 kernel: [ 3141.167834] ceph: probably no mds server is up
Sep 17 12:21:18 ld3955 pvestatd[2482]: mount error: exit code 2
Sep 17 12:21:28 ld3955 kernel: [ 3151.319780] libceph: mon2 10.97.206.95:6789 session established
Sep 17 12:21:28 ld3955 kernel: [ 3151.327118] libceph: client38594183 fsid 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
Sep 17 12:21:28 ld3955 kernel: [ 3151.327163] ceph: probably no mds server is up
Sep 17 12:21:28 ld3955 pvestatd[2482]: mount error: exit code 2
Sep 17 12:21:38 ld3955 kernel: [ 3161.537316] libceph: mon0 10.97.206.93:6789 session established
Sep 17 12:21:38 ld3955 pvestatd[2482]: mount error: exit code 2
Sep 17 12:21:38 ld3955 kernel: [ 3161.543618] libceph: client38684721 fsid 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
Sep 17 12:21:38 ld3955 kernel: [ 3161.544383] ceph: probably no mds server is up
There's no error in ceph-mds.ld3955.log:
root@ld3955:~# tail -f /var/log/ceph/ceph-mds.ld3955.log
2019-09-17 12:08:14.670 7f9610af4700 0 ms_deliver_dispatch: unhandled message 0x563ee6090500 osd_map(183147..183147 src has 172245..183147) v4
from mon.0 v2:10.97.206.93:3300/0
2019-09-17 12:08:14.670 7f9610af4700 0 ms_deliver_dispatch: unhandled message 0x563ee2267440 mdsmap(e 66927) v1 from mon.0 v2:10.97.206.93:3300/0
2019-09-17 12:08:14.670 7f9610af4700 0 ms_deliver_dispatch: unhandled message 0x563ee2267200 mdsmap(e 66928) v1 from mon.0 v2:10.97.206.93:3300/0
2019-09-17 12:08:14.670 7f96092e5700 0 mds.0.log _replay journaler got error -11, aborting
2019-09-17 12:11:48.279 7fb75f9ca340 0 set uid:gid to 64045:64045 (ceph:ceph)
2019-09-17 12:11:48.279 7fb75f9ca340 0 ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable), process ceph-mds, pid
45678
2019-09-17 12:11:48.279 7fb75f9ca340 0 pidfile_write: ignore empty --pid-file
2019-09-17 12:11:48.283 7fb75bee3700 1 mds.ld3955 Updating MDS map to version 66928 from mon.2
2019-09-17 12:11:49.231 7fb75bee3700 1 mds.ld3955 Updating MDS map to version 66929 from mon.2
2019-09-17 12:11:49.231 7fb75bee3700 1 mds.ld3955 Map has assigned me to become a standby
The other node is now replay, and there is this error in ceph-mds.ld3976.log:
root@ld3976:~# tail -f /var/log/ceph/ceph-mds.ld3976.log
2019-09-17 12:33:28.189 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10c7a80 0x5589f10cf000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.193 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10d2480 0x5589ed271800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.197 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10c7a80 0x5589f10cf000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.197 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10d2480 0x5589ed271800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.201 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10c7a80 0x5589f10cf000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.205 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10d2480 0x5589ed271800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.209 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10c7a80 0x5589f10cf000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.213 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10d2480 0x5589ed271800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.213 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10c7a80 0x5589f10cf000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-09-17 12:33:28.221 7f576b46c700 0 --1- [v2:10.97.206.92:6800/1176103745,v1:10.97.206.92:6801/1176103745] >> v1:10.97.206.93:7058/3301343 conn(0x5589f10d2480 0x5589ed271800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
What is causing this error?
How can I fix it?