[SOLVED] PVE7 unable to create OSD

MoreDakka

Active Member
May 2, 2019
58
13
28
45
Getting this error:

Code:
create OSD on /dev/sdb (bluestore)
wiping block device /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.543614 s, 386 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 39e13a26-d985-40b8-852b-291d8592de56
 stderr: 2021-11-10T14:49:55.396-0700 7fa8d2016700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
 stderr: 2021-11-10T14:49:55.396-0700 7fa8d2016700 -1 AuthRegistry(0x7fa8cc05b128) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: 2021-11-10T14:49:55.404-0700 7fa8caffd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 84681487-a5e1-431f-8741-95694c39d8ac --data /dev/sdb' failed: exit code 1

pveversion-v
Code:
root@pve1-cpu2:/etc/pve/priv# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-10
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-3
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
root@pve1-cpu2:/etc/pve/priv#

Config
Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 192.168.1.81/24
     fsid = 84681487-a5e1-431f-8741-95694c39d8ac
     mon_allow_pool_delete = true
     mon_host = 192.168.1.81 192.168.1.82 192.168.1.83 192.168.1.84
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 192.168.1.81/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve1-cpu1]
     public_addr = 192.168.1.81

[mon.pve1-cpu2]
     public_addr = 192.168.1.82

[mon.pve1-cpu3]
     public_addr = 192.168.1.83

[mon.pve1-cpu4]
     public_addr = 192.168.1.84

CrushMap
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host pve1-cpu1 {
    id -3        # do not change unnecessarily
    id -4 class ssd        # do not change unnecessarily
    # weight 1.819
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 1.819
}
root default {
    id -1        # do not change unnecessarily
    id -2 class ssd        # do not change unnecessarily
    # weight 1.819
    alg straw2
    hash 0    # rjenkins1
    item pve1-cpu1 weight 1.819
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

lsblk

Code:
root@pve1-cpu2:/etc/pve/priv# lsblk
NAME               MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                  8:0    0 238.5G  0 disk
├─sda1               8:1    0  1007K  0 part
├─sda2               8:2    0   512M  0 part
└─sda3               8:3    0   238G  0 part
  ├─pve-swap       253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root       253:1    0  59.3G  0 lvm  /
  ├─pve-data_tmeta 253:2    0   1.6G  0 lvm
  │ └─pve-data     253:4    0 151.6G  0 lvm
  └─pve-data_tdata 253:3    0 151.6G  0 lvm
    └─pve-data     253:4    0 151.6G  0 lvm
sdb                  8:16   0   1.8T  0 disk

Tried this:

Code:
root@pve1-cpu2:/etc/pve/priv# ceph auth get client.bootstrap-osd
[client.bootstrap-osd]
        key = ################################
        caps mon = "allow profile bootstrap-osd"
exported keyring for client.bootstrap-osd

Now I've getting:

Code:
()
create OSD on /dev/sdb (bluestore)
wiping block device /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.542573 s, 387 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 090c6346-0724-4dfa-86d4-ad5ef236415a
 stderr: 2021-11-10T15:02:30.272-0700 7f4f9fc94700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
 stderr: 2021-11-10T15:02:30.272-0700 7f4f9fc94700 -1 AuthRegistry(0x7f4f9805b128) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 84681487-a5e1-431f-8741-95694c39d8ac --data /dev/sdb' failed: exit code 1

Not sure where to go from here. This is all very default, I've just been banging my head against a desk to get infiniband 40g NICs faster that 7gb.... Got it up to 26gb, what a pain. This error with CEPH google hasn't assisted me much so hopefully the forums can help.
 
Looks like something failed in a previous step and the client keyring was not created properly in /etc/pve/priv/ceph.client.bootstrap-osd.keyring.

Check that pve-cluster.service is running properly in every node to confirm that pmxcfs is running properly:

journalctl -u pve-cluster.service

If seems ok in all nodes, try creating /etc/pve/priv/ceph.client.bootstrap-osd.keyring manually. Get the key from the ceph auth get client.bootstrap-osd output and create /etc/pve/priv/ceph.client.bootstrap-osd.keyring like this:

Code:
[client.admin]
    key = KEY
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"

Try again to create the OSD.

If something else failed before creating the OSD there may be some other issues in your cluster which may show up later. I have not set up any Ceph cluster using Proxmox 7 yet, so I'm not aware of bugs or considerations for this version.
 
  • Like
Reactions: Syed Yakoob
I've got exactly the same problem. I've checked all the config as MoreDakka above and it is similar.

I had created a CEPH cluster yesterday and the deleted it and removed CEPH and reinstalled. I think my problem is something latent is left behind relating to the disk.
 
Wasn't in the office for a bunch of days but got in to check the journal:

Code:
Nov 10 14:41:30 pve1-cpu2 systemd[1]: Starting The Proxmox VE cluster filesystem...
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [quorum] crit: quorum_initialize failed: 2
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [quorum] crit: can't initialize service
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [confdb] crit: cmap_initialize failed: 2
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [confdb] crit: can't initialize service
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [dcdb] crit: cpg_initialize failed: 2
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [dcdb] crit: can't initialize service
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [status] crit: cpg_initialize failed: 2
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [status] crit: can't initialize service
Nov 10 14:41:31 pve1-cpu2 systemd[1]: Started The Proxmox VE cluster filesystem.

That was on the 10th. Since then there has been success logs:

Code:
Nov 14 14:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 15:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 16:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 17:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 18:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 19:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 20:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 21:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 22:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 23:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 00:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 01:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 02:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 02:36:50 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 02:36:54 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 03:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 03:40:44 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 03:40:49 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 04:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 05:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 05:40:01 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 05:40:06 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 06:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 07:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 08:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 09:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 10:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful

It's all very repetitive so just a snip to show what's in there now.

I also followed your directions and created the ceph.client.bootstrap-osd.keyring

Code:
root@pve1-cpu2:~# cat /etc/pve/priv/ceph.client.bootstrap-osd.keyring
[client.admin]
    key = ######################################
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"
root@pve1-cpu2:~#

Here is the new error:

Code:
create OSD on /dev/sdb (bluestore)
wiping block device /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.546352 s, 384 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 7a803c94-f34c-4bbb-94ea-cc6d7dbd45cb
 stderr: 2021-11-15T11:24:06.239-0700 7f2b6affd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 84681487-a5e1-431f-8741-95694c39d8ac --data /dev/sdb' failed: exit code 1

2 steps forward 1 step back.
 
Well, after looking through logs and digging through posts I ran into this one:

/usr/bin/ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring

So I compared the key in the /etc/pve/priv/ceph.client.bootstrap-osd.keyring and /var/lib/ceph/bootstrap-osd/ceph.keyring. They didn't match.

The command that VictorSTS showed gave me the correct key for the /etc/pve/priv/ceph.client.bootstrap-osd.keyring but it seems that the system had the wrong key in the /var/lib/ceph/bootstrap-osd/ceph.keyring. Running the above command corrected this problem and now I can add in a new OSD.

On
 
Thank you guys @MoreDakka @VictorSTS
I try to summarize the step when sovling it
first get the key by using ceph auth get client.bootstrap-osd
and then create file vi /etc/pve/priv/ceph.client.bootstrap-osd.keyring write the format like
Code:
[client.admin]
    key = KEY
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"
The KEY is in the output info of ceph auth get client.bootstrap-osd
Last step is to write another keyring by ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring
Then you can create the osd successfully!!
 
  • Like
Reactions: ucholak and LeFred

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!