[SOLVED] MDS fails to start: unable to find a keyring on /var/lib/ceph/mds/ceph-admin/keyring

cmonty14

Well-Known Member
Mar 4, 2014
343
5
58
Hi,
I cannot start MDS services on active/standby node:
root@ld3955:/var/log# systemctl status ceph-mds@ld3955
ceph-mds@ld3955.service - Ceph metadata server daemon
Loaded: loaded (/lib/systemd/system/ceph-mds@.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mds@.service.d
└─ceph-after-pve-cluster.conf
Active: inactive (dead) since Tue 2019-09-03 10:51:26 CEST; 21min ago
Process: 2777472 ExecStart=/usr/bin/ceph-mds -f --cluster ${CLUSTER} --id ld3955 --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
Main PID: 2777472 (code=exited, status=0/SUCCESS)

Sep 03 10:51:26 ld3955 systemd[1]: Started Ceph metadata server daemon.
Sep 03 10:51:26 ld3955 ceph-mds[2777472]: starting mds.ld3955 at 2019-09-03 10:51:26.717 7f35e0318340 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.mds.ld3955.keyring: (13) Permission denied
Sep 03 10:51:26 ld3955 ceph-mds[2777472]: 2019-09-03 10:51:26.717 7f35e0318340 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.mds.ld3955.keyring: (13) Permission denied
Sep 03 10:51:26 ld3955 ceph-mds[2777472]: 2019-09-03 10:51:26.717 7f35e0318340 -1 monclient: keyring not found
Sep 03 10:51:26 ld3955 ceph-mds[2777472]: 2019-09-03 10:51:26.717 7f35e0318340 -1 mds.ld3955 ERROR: failed to init monc: (13) Permission denied
Sep 03 10:51:26 ld3955 systemd[1]: ceph-mds@ld3955.service: Succeeded.


However when I run ceph-mds in foreground to debug it I get a different error message:
root@ld3955:/var/log# ceph-mds -d --debug_mds 3
2019-09-03 11:28:56.639 7f4abb2cf340 -1 auth: unable to find a keyring on /var/lib/ceph/mds/ceph-admin/keyring: (2) No such file or directory
2019-09-03 11:28:56.639 7f4abb2cf340 -1 AuthRegistry(0x558079552140) no keyring found at /var/lib/ceph/mds/ceph-admin/keyring, disabling cephx
2019-09-03 11:28:56.643 7f4abb2cf340 -1 auth: unable to find a keyring on /var/lib/ceph/mds/ceph-admin/keyring: (2) No such file or directory
2019-09-03 11:28:56.643 7f4abb2cf340 -1 AuthRegistry(0x7fffa244c8f8) no keyring found at /var/lib/ceph/mds/ceph-admin/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)


Both error messages are pointing to a missing keyring.
In fact the keyring /var/lib/ceph/mds/ceph-admin/keyring does not exist.
But in /etc/pve/priv all keyings are available:
root@ld3955:/var/log# ls -l /etc/pve/priv/
insgesamt 6
-rw------- 1 root www-data 1679 Sep 2 12:30 authkey.key
-rw------- 1 root www-data 5415 Aug 21 18:00 authorized_keys
-rw------- 1 root www-data 0 Aug 21 11:03 authorized_keys.tmp.1104659
drwx------ 2 root www-data 0 Jun 6 17:17 ceph
-rw------- 1 root www-data 63 Mai 28 14:32 ceph.client.admin.keyring
-rw------- 1 root www-data 61 Jul 8 11:34 ceph.mds.ld3955.keyring
-rw------- 1 root www-data 61 Jul 8 11:34 ceph.mds.ld3976.keyring
-rw------- 1 root www-data 236 Mai 28 14:32 ceph.mon.keyring
-rw------- 1 root www-data 4692 Aug 21 18:00 known_hosts
drwx------ 2 root www-data 0 Mai 20 17:24 lock
-rw------- 1 root www-data 3243 Mai 20 17:24 pve-root-ca.key
-rw------- 1 root www-data 3 Mai 22 14:53 pve-root-ca.srl


Can you please advise how to fix this issue?

THX
 
Last edited:
How does your ceph.conf look like?
 
root@ld3955:~# more /etc/pve/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 192.168.1.0/27
debug ms = 0/0
fsid = 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
mon allow pool delete = true
mon osd full ratio = .85
mon osd nearfull ratio = .75
osd crush update on start = false
osd pool default min size = 2
osd pool default size = 3
public network = 10.97.206.0/24
mon_host = 10.97.206.93,10.97.206.94,10.97.206.95
bluestore_block_db_size = 10737418240
osd deep scrub interval = 1209600
osd scrub begin hour = 19
osd scrub end hour = 6
osd scrub sleep = 0.1

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[osd]
osd journal size = 1024

[mds.ld3955]
host = ld3955
mds standby for name = pve
keyring = /etc/pve/priv/ceph.mds.ld3955.keyring

[mds.ld3976]
host = ld3976
mds standby for name = pve
keyring = /etc/pve/priv/ceph.mds.ld3976.keyring
 
keyring = /etc/pve/priv/ceph.mds.ld3955.keyring
These keyrings should be under /var/lib/ceph/mds/ceph-<ID>/keyring, as the user Ceph has no access to /etc/pve/priv. Also the keyring does not need to be shared, as it is for each MDS only.

Best place the keyring in the folder and remove the line from the config.
 
  • Like
Reactions: cmonty14
The file /var/lib/ceph/mds/ceph-<ID>/keyring already exists.
Therefore I simply modified the config in /etc/ceph/ceph.conf and now MDS starts w/o errors.
 
I wonder if this is similar to my problem when I updated... now I get

ceph-osd --check-wants-journal
Code:
root@node2:/var/lib/ceph/osd/ceph-1# ceph-osd --check-wants-journal
2022-04-14T20:55:43.210-0500 7f71cd669f00 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-admin/keyring: (2) No such file or directory
2022-04-14T20:55:43.214-0500 7f71cd669f00 -1 AuthRegistry(0x558eb8d39340) no keyring found at /var/lib/ceph/osd/ceph-admin/keyring, disabling cephx
2022-04-14T20:55:43.214-0500 7f71cd669f00 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-admin/keyring: (2) No such file or directory
2022-04-14T20:55:43.214-0500 7f71cd669f00 -1 AuthRegistry(0x7fff5ac25250) no keyring found at /var/lib/ceph/osd/ceph-admin/keyring, disabling cephx
failed to fetch mon config (--no-mon-config to skip)
root@node2:/var/lib/ceph/osd/ceph-1# cd ..
root@node2:/var/lib/ceph/osd# ls
ceph-1
root@node2:/var/lib/ceph/osd# cd ceph-1/
root@node2:/var/lib/ceph/osd/ceph-1# ls
block  ceph_fsid  fsid  keyring  ready  require_osd_release  type  whoami
root@node2:/var/lib/ceph/osd/ceph-1#


1649987794346.png
 

Attachments

  • 1649987808263.png
    1649987808263.png
    42.6 KB · Views: 11
My Ceph cluster is currently down.

It, too, complains that it cannot find the keyring in /var/lib/ceph/osd/ceph-admin/keyring

However, there are two keyrings in /var/lib/ceph/osd/ceph-{1,5}.

And I have the following entries in my ceph.conf:

[client] keyring = /etc/pve/priv/$cluster.$name.keyring [mds] keyring = /var/lib/ceph/mds/ceph-$id/keyring

These keyrings should be under /var/lib/ceph/mds/ceph-<ID>/keyring, as the user Ceph has no access to /etc/pve/priv. Also the keyring does not need to be shared, as it is for each MDS only.
I *believe* the first entry stems from when I tried to make my CEPH cluster accessible from the outside - it did not work and I gave up (but never cleaned up afterwards).

So, how do I get my CEPH cluster back up working again?

Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!