[SOLVED] MDS fails to start: unable to find a keyring on /var/lib/ceph/mds/ceph-admin/keyring

cmonty14

Well-Known Member
Mar 4, 2014
343
5
58
Hi,
I cannot start MDS services on active/standby node:
root@ld3955:/var/log# systemctl status ceph-mds@ld3955
ceph-mds@ld3955.service - Ceph metadata server daemon
Loaded: loaded (/lib/systemd/system/ceph-mds@.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mds@.service.d
└─ceph-after-pve-cluster.conf
Active: inactive (dead) since Tue 2019-09-03 10:51:26 CEST; 21min ago
Process: 2777472 ExecStart=/usr/bin/ceph-mds -f --cluster ${CLUSTER} --id ld3955 --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
Main PID: 2777472 (code=exited, status=0/SUCCESS)

Sep 03 10:51:26 ld3955 systemd[1]: Started Ceph metadata server daemon.
Sep 03 10:51:26 ld3955 ceph-mds[2777472]: starting mds.ld3955 at 2019-09-03 10:51:26.717 7f35e0318340 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.mds.ld3955.keyring: (13) Permission denied
Sep 03 10:51:26 ld3955 ceph-mds[2777472]: 2019-09-03 10:51:26.717 7f35e0318340 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.mds.ld3955.keyring: (13) Permission denied
Sep 03 10:51:26 ld3955 ceph-mds[2777472]: 2019-09-03 10:51:26.717 7f35e0318340 -1 monclient: keyring not found
Sep 03 10:51:26 ld3955 ceph-mds[2777472]: 2019-09-03 10:51:26.717 7f35e0318340 -1 mds.ld3955 ERROR: failed to init monc: (13) Permission denied
Sep 03 10:51:26 ld3955 systemd[1]: ceph-mds@ld3955.service: Succeeded.


However when I run ceph-mds in foreground to debug it I get a different error message:
root@ld3955:/var/log# ceph-mds -d --debug_mds 3
2019-09-03 11:28:56.639 7f4abb2cf340 -1 auth: unable to find a keyring on /var/lib/ceph/mds/ceph-admin/keyring: (2) No such file or directory
2019-09-03 11:28:56.639 7f4abb2cf340 -1 AuthRegistry(0x558079552140) no keyring found at /var/lib/ceph/mds/ceph-admin/keyring, disabling cephx
2019-09-03 11:28:56.643 7f4abb2cf340 -1 auth: unable to find a keyring on /var/lib/ceph/mds/ceph-admin/keyring: (2) No such file or directory
2019-09-03 11:28:56.643 7f4abb2cf340 -1 AuthRegistry(0x7fffa244c8f8) no keyring found at /var/lib/ceph/mds/ceph-admin/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)


Both error messages are pointing to a missing keyring.
In fact the keyring /var/lib/ceph/mds/ceph-admin/keyring does not exist.
But in /etc/pve/priv all keyings are available:
root@ld3955:/var/log# ls -l /etc/pve/priv/
insgesamt 6
-rw------- 1 root www-data 1679 Sep 2 12:30 authkey.key
-rw------- 1 root www-data 5415 Aug 21 18:00 authorized_keys
-rw------- 1 root www-data 0 Aug 21 11:03 authorized_keys.tmp.1104659
drwx------ 2 root www-data 0 Jun 6 17:17 ceph
-rw------- 1 root www-data 63 Mai 28 14:32 ceph.client.admin.keyring
-rw------- 1 root www-data 61 Jul 8 11:34 ceph.mds.ld3955.keyring
-rw------- 1 root www-data 61 Jul 8 11:34 ceph.mds.ld3976.keyring
-rw------- 1 root www-data 236 Mai 28 14:32 ceph.mon.keyring
-rw------- 1 root www-data 4692 Aug 21 18:00 known_hosts
drwx------ 2 root www-data 0 Mai 20 17:24 lock
-rw------- 1 root www-data 3243 Mai 20 17:24 pve-root-ca.key
-rw------- 1 root www-data 3 Mai 22 14:53 pve-root-ca.srl


Can you please advise how to fix this issue?

THX
 
Last edited:
How does your ceph.conf look like?
 
root@ld3955:~# more /etc/pve/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 192.168.1.0/27
debug ms = 0/0
fsid = 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
mon allow pool delete = true
mon osd full ratio = .85
mon osd nearfull ratio = .75
osd crush update on start = false
osd pool default min size = 2
osd pool default size = 3
public network = 10.97.206.0/24
mon_host = 10.97.206.93,10.97.206.94,10.97.206.95
bluestore_block_db_size = 10737418240
osd deep scrub interval = 1209600
osd scrub begin hour = 19
osd scrub end hour = 6
osd scrub sleep = 0.1

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[osd]
osd journal size = 1024

[mds.ld3955]
host = ld3955
mds standby for name = pve
keyring = /etc/pve/priv/ceph.mds.ld3955.keyring

[mds.ld3976]
host = ld3976
mds standby for name = pve
keyring = /etc/pve/priv/ceph.mds.ld3976.keyring
 
keyring = /etc/pve/priv/ceph.mds.ld3955.keyring
These keyrings should be under /var/lib/ceph/mds/ceph-<ID>/keyring, as the user Ceph has no access to /etc/pve/priv. Also the keyring does not need to be shared, as it is for each MDS only.

Best place the keyring in the folder and remove the line from the config.
 
  • Like
Reactions: cmonty14
The file /var/lib/ceph/mds/ceph-<ID>/keyring already exists.
Therefore I simply modified the config in /etc/ceph/ceph.conf and now MDS starts w/o errors.
 
I wonder if this is similar to my problem when I updated... now I get

ceph-osd --check-wants-journal
Code:
root@node2:/var/lib/ceph/osd/ceph-1# ceph-osd --check-wants-journal
2022-04-14T20:55:43.210-0500 7f71cd669f00 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-admin/keyring: (2) No such file or directory
2022-04-14T20:55:43.214-0500 7f71cd669f00 -1 AuthRegistry(0x558eb8d39340) no keyring found at /var/lib/ceph/osd/ceph-admin/keyring, disabling cephx
2022-04-14T20:55:43.214-0500 7f71cd669f00 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-admin/keyring: (2) No such file or directory
2022-04-14T20:55:43.214-0500 7f71cd669f00 -1 AuthRegistry(0x7fff5ac25250) no keyring found at /var/lib/ceph/osd/ceph-admin/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)
root@node2:/var/lib/ceph/osd/ceph-1# cd ..
root@node2:/var/lib/ceph/osd# ls
ceph-1
root@node2:/var/lib/ceph/osd# cd ceph-1/
root@node2:/var/lib/ceph/osd/ceph-1# ls
block  ceph_fsid  fsid  keyring  ready  require_osd_release  type  whoami
root@node2:/var/lib/ceph/osd/ceph-1#


1649987794346.png
 

Attachments

  • 1649987808263.png
    1649987808263.png
    42.6 KB · Views: 17
My Ceph cluster is currently down.

It, too, complains that it cannot find the keyring in /var/lib/ceph/osd/ceph-admin/keyring

However, there are two keyrings in /var/lib/ceph/osd/ceph-{1,5}.

And I have the following entries in my ceph.conf:

[client] keyring = /etc/pve/priv/$cluster.$name.keyring [mds] keyring = /var/lib/ceph/mds/ceph-$id/keyring

These keyrings should be under /var/lib/ceph/mds/ceph-<ID>/keyring, as the user Ceph has no access to /etc/pve/priv. Also the keyring does not need to be shared, as it is for each MDS only.
I *believe* the first entry stems from when I tried to make my CEPH cluster accessible from the outside - it did not work and I gave up (but never cleaned up afterwards).

So, how do I get my CEPH cluster back up working again?

Thanks