Can not create Ceph OSD due to keyring error

sheshman

Member
Jan 16, 2023
55
4
13
Hi,

Using PVE 8.0.3, clustered with two identical IBM server, trying to create ceph osd but system returns error.

pveversion -v output;
Code:
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

ceph.conf content
Code:
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 192.168.1.3/24
         fsid = 61680b34-604b-4cfe-a4a1-19a5e368d575
         mon_allow_pool_delete = true
         mon_host = 192.168.1.3 192.168.1.4
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 192.168.1.3/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring #tried with commenting out this line but nothing changed

[mon.node01]
         public_addr = 192.168.1.3

[mon.node02]
         public_addr = 192.168.1.4

error ;
Code:
create OSD on /dev/sdb (bluestore)
wiping block device /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.275745 s, 761 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 85b65a60-ffb1-4454-b0e0-c5ca0c04561b
Running command: vgcreate --force --yes ceph-8374c6b7-9e60-403b-91f1-ac179befb601 /dev/sdb
 stdout: Physical volume "/dev/sdb" successfully created.
 stdout: Volume group "ceph-8374c6b7-9e60-403b-91f1-ac179befb601" successfully created
--> Was unable to complete a new OSD, will rollback changes
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
 stderr: 2023-10-13T23:41:48.432+0300 7f3b26eff6c0 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2023-10-13T23:41:48.432+0300 7f3b26eff6c0 -1 AuthRegistry(0x7f3b20060610) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
 stderr: purged osd.0
-->  RuntimeError: Unable to find any LV for zapping OSD: 0
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 61680b34-604b-4cfe-a4a1-19a5e368d575 --data /dev/sdb' failed: exit code 1

Code:
/etc/ceph/ceph.client.bootstrap-osd.keyring
/etc/ceph/ceph.keyring
/etc/ceph/keyring
/etc/ceph/keyring.bin

files are empty (there is no file at all)

found this thread;
https://forum.proxmox.com/threads/pve7-unable-to-create-osd.99490/

and tried the solutions in it but didn't solved the issue.

Any ideas?
 
Hi,
Same problem, PVE 8.0.3, failing on 4 identical IBM servers , create OSD worked for direct attached disks (nvme) but failed for external disks (attached disk enclosure).
Did you solved your problem ? Thanks.
 
  • Like
Reactions: sheshman
Hi,
Same problem, PVE 8.0.3, failing on 4 identical IBM servers , create OSD worked for direct attached disks (nvme) but failed for external disks (attached disk enclosure).
Did you solved your problem ? Thanks.
No i was not able to solve it, still searching a fix, it works on directly attached disks but not working on storage
 
Hi,
I have installed Ubuntu 22.04 on my server and reinitialized a new ceph cluster. The problem was still there, but I observed some warnings like :
Bash:
stderr: Cannot update volume group ceph************* with duplicate PV devices.
.
A search on google revealed a link on redhat portal about "Duplicate PV Warnings for Multipathed Devices", which got me thinking.. A quick check on my linux system showed that I have dev mappings for 16 hdd's, when in fact I only have 8 hdd's. LOL.. what can I say, I was too happy that I had so many hdds's that I missed the problem. :))).
So , removing one SAS cable (extender has a redundant controller so I had 2 SAS cables connected ), SOLVED my problem. This was on my Ubuntu ceph cluster, but i am pretty sure is the same problme since, the Proxmox also showed me 16 drives, not 8.
I will try again tommorow on Proxmox.
 
  • Like
Reactions: sheshman
Hi,
I have installed Ubuntu 22.04 on my server and reinitialized a new ceph cluster. The problem was still there, but I observed some warnings like :
Bash:
stderr: Cannot update volume group ceph************* with duplicate PV devices.
.
A search on google revealed a link on redhat portal about "Duplicate PV Warnings for Multipathed Devices", which got me thinking.. A quick check on my linux system showed that I have dev mappings for 16 hdd's, when in fact I only have 8 hdd's. LOL.. what can I say, I was too happy that I had so many hdds's that I missed the problem. :))).
So , removing one SAS cable (extender has a redundant controller so I had 2 SAS cables connected ), SOLVED my problem. This was on my Ubuntu ceph cluster, but i am pretty sure is the same problme since, the Proxmox also showed me 16 drives, not 8.
I will try again tommorow on Proxmox.
This could be the issue on my case too, my nodes connected to IBM 7200 storage, this storage has 2 controllers for redundancy and both are connected to proxmox, so i have sda,sdb,sdc on first controller and sdd,sde,sdf on second controller, but all 6 drives actually all the same drives, i mean proxmox shows 6 drives due to multi controllers but actually there are only 3 LUNs, i'll disconnect second controller and try again to see if that works.