Can not create Ceph OSD due to keyring error

sheshman · Oct 13, 2023

Hi,

Using PVE 8.0.3, clustered with two identical IBM server, trying to create ceph osd but system returns error.

pveversion -v output;

Code:

proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

ceph.conf content

Code:

[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 192.168.1.3/24
         fsid = 61680b34-604b-4cfe-a4a1-19a5e368d575
         mon_allow_pool_delete = true
         mon_host = 192.168.1.3 192.168.1.4
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 192.168.1.3/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring #tried with commenting out this line but nothing changed

[mon.node01]
         public_addr = 192.168.1.3

[mon.node02]
         public_addr = 192.168.1.4

error ;

Code:

create OSD on /dev/sdb (bluestore)
wiping block device /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.275745 s, 761 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 85b65a60-ffb1-4454-b0e0-c5ca0c04561b
Running command: vgcreate --force --yes ceph-8374c6b7-9e60-403b-91f1-ac179befb601 /dev/sdb
 stdout: Physical volume "/dev/sdb" successfully created.
 stdout: Volume group "ceph-8374c6b7-9e60-403b-91f1-ac179befb601" successfully created
--> Was unable to complete a new OSD, will rollback changes
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 --yes-i-really-mean-it
 stderr: 2023-10-13T23:41:48.432+0300 7f3b26eff6c0 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2023-10-13T23:41:48.432+0300 7f3b26eff6c0 -1 AuthRegistry(0x7f3b20060610) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
 stderr: purged osd.0
-->  RuntimeError: Unable to find any LV for zapping OSD: 0
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 61680b34-604b-4cfe-a4a1-19a5e368d575 --data /dev/sdb' failed: exit code 1

Code:

/etc/ceph/ceph.client.bootstrap-osd.keyring
/etc/ceph/ceph.keyring
/etc/ceph/keyring
/etc/ceph/keyring.bin

files are empty (there is no file at all)

found this thread;
https://forum.proxmox.com/threads/pve7-unable-to-create-osd.99490/

and tried the solutions in it but didn't solved the issue.

Any ideas?

sheshman · Oct 17, 2023

anyone?

adelincusman · Oct 24, 2023

Hi,
Same problem, PVE 8.0.3, failing on 4 identical IBM servers , create OSD worked for direct attached disks (nvme) but failed for external disks (attached disk enclosure).
Did you solved your problem ? Thanks.

sheshman · Oct 24, 2023

adelincusman said:
Hi,
Same problem, PVE 8.0.3, failing on 4 identical IBM servers , create OSD worked for direct attached disks (nvme) but failed for external disks (attached disk enclosure).
Did you solved your problem ? Thanks.

No i was not able to solve it, still searching a fix, it works on directly attached disks but not working on storage

adelincusman · Oct 24, 2023

Hi,
I have installed Ubuntu 22.04 on my server and reinitialized a new ceph cluster. The problem was still there, but I observed some warnings like :

Bash:

stderr: Cannot update volume group ceph************* with duplicate PV devices.

.
A search on google revealed a link on redhat portal about "Duplicate PV Warnings for Multipathed Devices", which got me thinking.. A quick check on my linux system showed that I have dev mappings for 16 hdd's, when in fact I only have 8 hdd's. LOL.. what can I say, I was too happy that I had so many hdds's that I missed the problem.

)).
So , removing one SAS cable (extender has a redundant controller so I had 2 SAS cables connected ), SOLVED my problem. This was on my Ubuntu ceph cluster, but i am pretty sure is the same problme since, the Proxmox also showed me 16 drives, not 8.
I will try again tommorow on Proxmox.

sheshman · Oct 25, 2023

adelincusman said:
Hi,
I have installed Ubuntu 22.04 on my server and reinitialized a new ceph cluster. The problem was still there, but I observed some warnings like :

Bash:

stderr: Cannot update volume group ceph************* with duplicate PV devices.

.
A search on google revealed a link on redhat portal about "Duplicate PV Warnings for Multipathed Devices", which got me thinking.. A quick check on my linux system showed that I have dev mappings for 16 hdd's, when in fact I only have 8 hdd's. LOL.. what can I say, I was too happy that I had so many hdds's that I missed the problem. )).
So , removing one SAS cable (extender has a redundant controller so I had 2 SAS cables connected ), SOLVED my problem. This was on my Ubuntu ceph cluster, but i am pretty sure is the same problme since, the Proxmox also showed me 16 drives, not 8.
I will try again tommorow on Proxmox.

This could be the issue on my case too, my nodes connected to IBM 7200 storage, this storage has 2 controllers for redundancy and both are connected to proxmox, so i have sda,sdb,sdc on first controller and sdd,sde,sdf on second controller, but all 6 drives actually all the same drives, i mean proxmox shows 6 drives due to multi controllers but actually there are only 3 LUNs, i'll disconnect second controller and try again to see if that works.

iwik · Mar 5, 2025

Hi i was facing same problem. Here is solution.

/etc/multipath.conf

Code:

defaults {
        polling_interval       10
}

devices {
    device {
        vendor                   "PURE"
        product                  "FlashArray"
        path_selector            "service-time 0"
        hardware_handler         "1 alua"
        path_grouping_policy     group_by_prio
        prio                     alua
        failback                 immediate
        path_checker             tur
        fast_io_fail_tmo         10
        user_friendly_names      no
        no_path_retry            0
        features                 0
        dev_loss_tmo             600
    }
}

Find devices

Code:

multipath -v3 | head

===== paths list =====
uuid                              hcil    dev dev_t pri dm_st chk_st vend/prod/rev     dev_st
3600508b1001c66d513ccfaef2630a8e9 0:1:0:0 sda 8:0   1   undef undef  HP,LOGICAL VOLUME unknown
3624a93701ae8e9b1cca140f800012c39 1:0:0:1 sdc 8:32  50  undef undef  PURE,FlashArray   unknown
3624a93701ae8e9b1cca140f800012c3a 1:0:0:2 sdd 8:48  50  undef undef  PURE,FlashArray   unknown

Add each multipath device

Code:

multipath -a 3600508b1001c66d513ccfaef2630a8e9
multipath -a 3624a93701ae8e9b1cca140f800012c3a
...

enable multipathd

Code:

systemctl enable multipathd

Edit LVM filter
/etc/lvm/lvm.conf

Code:

filter = [ "a|/dev/disk/by-id/dm-name-.*|", "r|.*|" ]

Update initramfs

Code:

update-initramfs -u -k all

Activate ceph volumes on each reboot

/etc/rc.local

Code:

vgchange -ay ; ceph-volume lvm activate --all

Can not create Ceph OSD due to keyring error

sheshman

Member

sheshman

Member

adelincusman

New Member

sheshman

Member

adelincusman

New Member

sheshman

Member

iwik

Active Member

We value your privacy