Can not create Ceph OSD due to keyring error

sheshman

Member
Jan 16, 2023
51
4
8
Hi,

Using PVE 8.0.3, clustered with two identical IBM server, trying to create ceph osd but system returns error.

pveversion -v output;
Code:
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

ceph.conf content
Code:
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 192.168.1.3/24
         fsid = 61680b34-604b-4cfe-a4a1-19a5e368d575
         mon_allow_pool_delete = true
         mon_host = 192.168.1.3 192.168.1.4
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 192.168.1.3/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring #tried with commenting out this line but nothing changed

[mon.node01]
         public_addr = 192.168.1.3

[mon.node02]
         public_addr = 192.168.1.4

error ;
Code:
create OSD on /dev/sdb (bluestore)
wiping block device /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.275745 s, 761 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 85b65a60-ffb1-4454-b0e0-c5ca0c04561b
Running command: vgcreate --force --yes ceph-8374c6b7-9e60-403b-91f1-ac179befb601 /dev/sdb
 stdout: Physical volume "/dev/sdb" successfully created.
 stdout: Volume group "ceph-8374c6b7-9e60-403b-91f1-ac179befb601" successfully created
--> Was unable to complete a new OSD, will rollback changes
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
 stderr: 2023-10-13T23:41:48.432+0300 7f3b26eff6c0 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2023-10-13T23:41:48.432+0300 7f3b26eff6c0 -1 AuthRegistry(0x7f3b20060610) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
 stderr: purged osd.0
-->  RuntimeError: Unable to find any LV for zapping OSD: 0
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 61680b34-604b-4cfe-a4a1-19a5e368d575 --data /dev/sdb' failed: exit code 1

Code:
/etc/ceph/ceph.client.bootstrap-osd.keyring
/etc/ceph/ceph.keyring
/etc/ceph/keyring
/etc/ceph/keyring.bin

files are empty (there is no file at all)

found this thread;
https://forum.proxmox.com/threads/pve7-unable-to-create-osd.99490/

and tried the solutions in it but didn't solved the issue.

Any ideas?
 
Hi,
Same problem, PVE 8.0.3, failing on 4 identical IBM servers , create OSD worked for direct attached disks (nvme) but failed for external disks (attached disk enclosure).
Did you solved your problem ? Thanks.
 
  • Like
Reactions: sheshman
Hi,
Same problem, PVE 8.0.3, failing on 4 identical IBM servers , create OSD worked for direct attached disks (nvme) but failed for external disks (attached disk enclosure).
Did you solved your problem ? Thanks.
No i was not able to solve it, still searching a fix, it works on directly attached disks but not working on storage
 
Hi,
I have installed Ubuntu 22.04 on my server and reinitialized a new ceph cluster. The problem was still there, but I observed some warnings like :
Bash:
stderr: Cannot update volume group ceph************* with duplicate PV devices.
.
A search on google revealed a link on redhat portal about "Duplicate PV Warnings for Multipathed Devices", which got me thinking.. A quick check on my linux system showed that I have dev mappings for 16 hdd's, when in fact I only have 8 hdd's. LOL.. what can I say, I was too happy that I had so many hdds's that I missed the problem. :))).
So , removing one SAS cable (extender has a redundant controller so I had 2 SAS cables connected ), SOLVED my problem. This was on my Ubuntu ceph cluster, but i am pretty sure is the same problme since, the Proxmox also showed me 16 drives, not 8.
I will try again tommorow on Proxmox.
 
  • Like
Reactions: sheshman
Hi,
I have installed Ubuntu 22.04 on my server and reinitialized a new ceph cluster. The problem was still there, but I observed some warnings like :
Bash:
stderr: Cannot update volume group ceph************* with duplicate PV devices.
.
A search on google revealed a link on redhat portal about "Duplicate PV Warnings for Multipathed Devices", which got me thinking.. A quick check on my linux system showed that I have dev mappings for 16 hdd's, when in fact I only have 8 hdd's. LOL.. what can I say, I was too happy that I had so many hdds's that I missed the problem. :))).
So , removing one SAS cable (extender has a redundant controller so I had 2 SAS cables connected ), SOLVED my problem. This was on my Ubuntu ceph cluster, but i am pretty sure is the same problme since, the Proxmox also showed me 16 drives, not 8.
I will try again tommorow on Proxmox.
This could be the issue on my case too, my nodes connected to IBM 7200 storage, this storage has 2 controllers for redundancy and both are connected to proxmox, so i have sda,sdb,sdc on first controller and sdd,sde,sdf on second controller, but all 6 drives actually all the same drives, i mean proxmox shows 6 drives due to multi controllers but actually there are only 3 LUNs, i'll disconnect second controller and try again to see if that works.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!