Unable to create Ceph OSD

Jul 30, 2019
3
0
1
30
Hello there, I recently upgraded from proxmox 5 to 6 as well as ceph luminous to nautilus. I wanted to go through and re-create the osds I have in my cluster. I ran into an issue with the second osd I wanted to convert (the first went fine). Here's what I get after I zap the disk:

Code:
pveceph createosd /dev/nvme8n2
create OSD on /dev/nvme8n2 (bluestore)
wipe disk/partition: /dev/nvme8n2
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.417411 s, 502 MB/s
-->  OSError: [Errno 5] Input/output error: '/var/lib/ceph/osd/ceph-2'
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 7f015a78-6062-4026-9a8a-7895aecf2eda
Running command: /sbin/vgcreate -s 1G --force --yes ceph-a449d0c2-542e-42cb-a5c3-395070ab51ca /dev/nvme8n2
 stdout: Physical volume "/dev/nvme8n2" successfully created.
 stdout: Volume group "ceph-a449d0c2-542e-42cb-a5c3-395070ab51ca" successfully created
Running command: /sbin/lvcreate --yes -l 100%FREE -n osd-block-7f015a78-6062-4026-9a8a-7895aecf2eda ceph-a449d0c2-542e-42cb-a5c3-395070ab51ca
 stdout: Logical volume "osd-block-7f015a78-6062-4026-9a8a-7895aecf2eda" created.
Running command: /usr/bin/ceph-authtool --gen-print-key
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.2 --yes-i-really-mean-it
 stderr: 2019-07-30 11:13:50.300 7f3e0f70b700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2019-07-30 11:13:50.300 7f3e0f70b700 -1 AuthRegistry(0x7f3e0807ed58) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: purged osd.2
command 'ceph-volume lvm create --cluster-fsid 0504a312-6d92-4443-b250-5e790079210e --data /dev/nvme8n2' failed: exit code 1

I can see it complaining about a missing key but I'm not really sure where that key should be coming from.
 
How does your ceph.conf look like?
 
Here's my ceph.conf
Code:
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 172.29.11.0/24
     fsid = this-is-the-fsid
     mon allow pool delete = true
     osd journal size = 5120
     osd pool default min size = 2
     osd pool default size = 3
     public network = 172.29.11.0/24
     mon_host = 172.29.11.11 172.29.11.12 172.29.11.13
[mds]

[osd]

[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

[mds.prox-ceph2]
     host = prox-ceph2
     mds standby for name = pve

[mds.prox-ceph1]
     host = prox-ceph1
     mds standby for name = pve

[mds.prox-ceph3]
     host = prox-ceph3
     mds standby for name = pve

[mon.prox-ceph2]
     host = prox-ceph2
     mon addr = 172.29.11.12:6789

[mon.prox-ceph1]
     host = prox-ceph1
     mon addr = 172.29.11.11:6789

[mon.prox-ceph3]
     host = prox-ceph3
     mon addr = 172.29.11.13:6789
 
stderr: 2019-07-30 11:13:50.300 7f3e0f70b700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
It seems, the ceph-volume is looking for the bootstrap keyring at the wrong location. Can you please comment out the keyring in the client section and try again?
 
It seems, the ceph-volume is looking for the bootstrap keyring at the wrong location. Can you please comment out the keyring in the client section and try again?

So the problem seemed to be an OS issue. I ended up rebooting the server and the drives came up in a different order under the /dev/nvme* directory. After that reboot I was able to create the new osd. I left the keyring under the client section of my ceph.conf and everything is working fine.
 
Hello!

We are same issue. We have OSD backfill full, cos we added to more osd's. When added to osd when we add the osd, they get a similar error message. None of the nodes in the file you are looking for are there, only the key named pool is there.

We try comment out the client sections, but the ceph GUI section give No such or file directory.

stderr: 2020-09-12 16:03:16.675 7ff0bb288700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2020-09-12 16:03:16.675 7ff0bb288700 -1 AuthRegistry(0x7ff0b40817b8) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
stderr: got monmap epoch 3

We need fast help, cos the pool is nearfull..
 
@szucs10, please open up a new thread with the current ceph -s, ceph osd df tree and pveversion -v.
 
I met the same problem and resolved it.
Code:
# ceph-volume lvm create --bluestore --data /dev/sda --block.wal /dev/nvme0n1p1 --block.db /dev/nvme0n1p7
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 17e6764c-c233-4a08-addf-3479bdab4671
stderr: 2021-01-13 11:10:12.695621 7f4304b52700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
stderr: 2021-01-13 11:10:12.695719 7f4304b52700 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication
stderr: 2021-01-13 11:10:12.695720 7f4304b52700  0 librados: client.bootstrap-osd initialization error (2) No such file or directory
stderr: [errno 2] error connecting to the cluster
-->  RuntimeError: Unable to create a new OSD id

then I found the file is same:
Code:
ansible pve31 -uroot -m shell -a 'md5sum /var/lib/ceph/bootstrap-osd/ceph.keyring'
172.31.254.1 | CHANGED | rc=0 >>
8ff95ce9ce219ea35ab95de8dc4b89d4  /var/lib/ceph/bootstrap-osd/ceph.keyring

172.31.254.2 | CHANGED | rc=0 >>
8ff95ce9ce219ea35ab95de8dc4b89d4  /var/lib/ceph/bootstrap-osd/ceph.keyring

172.31.254.5 | CHANGED | rc=0 >>
1a37b4090b058a48fffad6da24e8538c  /var/lib/ceph/bootstrap-osd/ceph.keyring

172.31.254.3 | CHANGED | rc=0 >>
8ff95ce9ce219ea35ab95de8dc4b89d4  /var/lib/ceph/bootstrap-osd/ceph.keyring

172.31.254.4 | CHANGED | rc=0 >>
1a37b4090b058a48fffad6da24e8538c  /var/lib/ceph/bootstrap-osd/ceph.keyring

172.31.254.7 | FAILED | rc=1 >>
md5sum: /var/lib/ceph/bootstrap-osd/ceph.keyring: No such file or directorynon-zero return code

172.31.254.9 | FAILED | rc=1 >>
md5sum: /var/lib/ceph/bootstrap-osd/ceph.keyring: No such file or directorynon-zero return code

172.31.254.6 | CHANGED | rc=0 >>
1a37b4090b058a48fffad6da24e8538c  /var/lib/ceph/bootstrap-osd/ceph.keyring

172.31.254.8 | CHANGED | rc=0 >>
1a37b4090b058a48fffad6da24e8538c  /var/lib/ceph/bootstrap-osd/ceph.keyring

just copy
Code:
scp /var/lib/ceph/bootstrap-osd/ceph.keyring 172.31.254.7:/var/lib/ceph/bootstrap-osd/ceph.keyring

then it worked.
 
  • Like
Reactions: LeFred
I've been able to solve this just creating the directory

mkdir /var/lib/ceph/bootstrap-osd

hope this helps.
 
In case it's of any use to anyone; I had a similar problem and this thread helped me pinpoint the answer.
In my case, a newly added node that wasn't completely new had the wrong ceph key in a few files. Replacing the key value in the following two files with the key value of an existing cluster member did the trick.

- /var/lib/ceph/bootstrap-osd/ceph.keyring
- /etc/ceph/ceph.client.admin.keyring

I suspect that some old residual config caused the Proxmox replication process to fail and update these keys when I added it to the cluster.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!