[SOLVED] PVE7 unable to create OSD

MoreDakka

Active Member
May 2, 2019
58
14
28
45
Getting this error:

Code:
create OSD on /dev/sdb (bluestore)
wiping block device /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.543614 s, 386 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 39e13a26-d985-40b8-852b-291d8592de56
 stderr: 2021-11-10T14:49:55.396-0700 7fa8d2016700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
 stderr: 2021-11-10T14:49:55.396-0700 7fa8d2016700 -1 AuthRegistry(0x7fa8cc05b128) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: 2021-11-10T14:49:55.404-0700 7fa8caffd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 84681487-a5e1-431f-8741-95694c39d8ac --data /dev/sdb' failed: exit code 1

pveversion-v
Code:
root@pve1-cpu2:/etc/pve/priv# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-10
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-3
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
root@pve1-cpu2:/etc/pve/priv#

Config
Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 192.168.1.81/24
     fsid = 84681487-a5e1-431f-8741-95694c39d8ac
     mon_allow_pool_delete = true
     mon_host = 192.168.1.81 192.168.1.82 192.168.1.83 192.168.1.84
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 192.168.1.81/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve1-cpu1]
     public_addr = 192.168.1.81

[mon.pve1-cpu2]
     public_addr = 192.168.1.82

[mon.pve1-cpu3]
     public_addr = 192.168.1.83

[mon.pve1-cpu4]
     public_addr = 192.168.1.84

CrushMap
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host pve1-cpu1 {
    id -3        # do not change unnecessarily
    id -4 class ssd        # do not change unnecessarily
    # weight 1.819
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 1.819
}
root default {
    id -1        # do not change unnecessarily
    id -2 class ssd        # do not change unnecessarily
    # weight 1.819
    alg straw2
    hash 0    # rjenkins1
    item pve1-cpu1 weight 1.819
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

lsblk

Code:
root@pve1-cpu2:/etc/pve/priv# lsblk
NAME               MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                  8:0    0 238.5G  0 disk
├─sda1               8:1    0  1007K  0 part
├─sda2               8:2    0   512M  0 part
└─sda3               8:3    0   238G  0 part
  ├─pve-swap       253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root       253:1    0  59.3G  0 lvm  /
  ├─pve-data_tmeta 253:2    0   1.6G  0 lvm
  │ └─pve-data     253:4    0 151.6G  0 lvm
  └─pve-data_tdata 253:3    0 151.6G  0 lvm
    └─pve-data     253:4    0 151.6G  0 lvm
sdb                  8:16   0   1.8T  0 disk

Tried this:

Code:
root@pve1-cpu2:/etc/pve/priv# ceph auth get client.bootstrap-osd
[client.bootstrap-osd]
        key = ################################
        caps mon = "allow profile bootstrap-osd"
exported keyring for client.bootstrap-osd

Now I've getting:

Code:
()
create OSD on /dev/sdb (bluestore)
wiping block device /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.542573 s, 387 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 090c6346-0724-4dfa-86d4-ad5ef236415a
 stderr: 2021-11-10T15:02:30.272-0700 7f4f9fc94700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
 stderr: 2021-11-10T15:02:30.272-0700 7f4f9fc94700 -1 AuthRegistry(0x7f4f9805b128) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 84681487-a5e1-431f-8741-95694c39d8ac --data /dev/sdb' failed: exit code 1

Not sure where to go from here. This is all very default, I've just been banging my head against a desk to get infiniband 40g NICs faster that 7gb.... Got it up to 26gb, what a pain. This error with CEPH google hasn't assisted me much so hopefully the forums can help.
 
Looks like something failed in a previous step and the client keyring was not created properly in /etc/pve/priv/ceph.client.bootstrap-osd.keyring.

Check that pve-cluster.service is running properly in every node to confirm that pmxcfs is running properly:

journalctl -u pve-cluster.service

If seems ok in all nodes, try creating /etc/pve/priv/ceph.client.bootstrap-osd.keyring manually. Get the key from the ceph auth get client.bootstrap-osd output and create /etc/pve/priv/ceph.client.bootstrap-osd.keyring like this:

Code:
[client.admin]
    key = KEY
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"

Try again to create the OSD.

If something else failed before creating the OSD there may be some other issues in your cluster which may show up later. I have not set up any Ceph cluster using Proxmox 7 yet, so I'm not aware of bugs or considerations for this version.
 
  • Like
Reactions: Syed Yakoob
I've got exactly the same problem. I've checked all the config as MoreDakka above and it is similar.

I had created a CEPH cluster yesterday and the deleted it and removed CEPH and reinstalled. I think my problem is something latent is left behind relating to the disk.
 
Wasn't in the office for a bunch of days but got in to check the journal:

Code:
Nov 10 14:41:30 pve1-cpu2 systemd[1]: Starting The Proxmox VE cluster filesystem...
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [quorum] crit: quorum_initialize failed: 2
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [quorum] crit: can't initialize service
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [confdb] crit: cmap_initialize failed: 2
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [confdb] crit: can't initialize service
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [dcdb] crit: cpg_initialize failed: 2
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [dcdb] crit: can't initialize service
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [status] crit: cpg_initialize failed: 2
Nov 10 14:41:30 pve1-cpu2 pmxcfs[1264]: [status] crit: can't initialize service
Nov 10 14:41:31 pve1-cpu2 systemd[1]: Started The Proxmox VE cluster filesystem.

That was on the 10th. Since then there has been success logs:

Code:
Nov 14 14:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 15:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 16:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 17:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 18:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 19:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 20:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 21:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 22:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 14 23:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 00:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 01:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 02:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 02:36:50 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 02:36:54 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 03:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 03:40:44 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 03:40:49 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 04:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 05:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 05:40:01 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 05:40:06 pve1-cpu2 pmxcfs[1264]: [status] notice: received log
Nov 15 06:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 07:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 08:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 09:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful
Nov 15 10:23:12 pve1-cpu2 pmxcfs[1264]: [dcdb] notice: data verification successful

It's all very repetitive so just a snip to show what's in there now.

I also followed your directions and created the ceph.client.bootstrap-osd.keyring

Code:
root@pve1-cpu2:~# cat /etc/pve/priv/ceph.client.bootstrap-osd.keyring
[client.admin]
    key = ######################################
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"
root@pve1-cpu2:~#

Here is the new error:

Code:
create OSD on /dev/sdb (bluestore)
wiping block device /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.546352 s, 384 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 7a803c94-f34c-4bbb-94ea-cc6d7dbd45cb
 stderr: 2021-11-15T11:24:06.239-0700 7f2b6affd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 84681487-a5e1-431f-8741-95694c39d8ac --data /dev/sdb' failed: exit code 1

2 steps forward 1 step back.
 
Well, after looking through logs and digging through posts I ran into this one:

/usr/bin/ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring

So I compared the key in the /etc/pve/priv/ceph.client.bootstrap-osd.keyring and /var/lib/ceph/bootstrap-osd/ceph.keyring. They didn't match.

The command that VictorSTS showed gave me the correct key for the /etc/pve/priv/ceph.client.bootstrap-osd.keyring but it seems that the system had the wrong key in the /var/lib/ceph/bootstrap-osd/ceph.keyring. Running the above command corrected this problem and now I can add in a new OSD.

On
 
Thank you guys @MoreDakka @VictorSTS
I try to summarize the step when sovling it
first get the key by using ceph auth get client.bootstrap-osd
and then create file vi /etc/pve/priv/ceph.client.bootstrap-osd.keyring write the format like
Code:
[client.admin]
    key = KEY
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"
The KEY is in the output info of ceph auth get client.bootstrap-osd
Last step is to write another keyring by ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring
Then you can create the osd successfully!!