ceph-osd-prestart.sh wants /var/lib/ceph/osd/ceph-$id/keyring, but it's missing

Romkus

Member
Nov 13, 2016
12
1
21
38
Hello!
I'm using ProxMox with no-subscription repository
pve-manager/4.4-12/e71b7a74 (running kernel: 4.4.40-1-pve)
and installed Ceph via "pveceph install -version jewel"
I'm having trouble with Ceph when creating OSDs. I'm using the pveceph utility, as described in https://pve.proxmox.com/wiki/Ceph_Server
When I create an OSD with command
Code:
pveceph createosd <mydisk>
or via GUI, the OSD appears at the GUI Ceph\OSD list, and even lighting green. But after some time it goes down (the service "ceph-osd@<number>.service" goes down). I haven't caught the time it fails yet, but last time it "worked" until node rebooted.
It fails because ceph-osd-prestart.sh script can't find the keyring file for OSD. It wants keyring file to be in "/var/lib/ceph/osd/ceph-$id/" directory, but I can't see any files there, and I don't know where this keyring file can be.
Here these error messages in syslog:
Code:
Mar 15 12:30:36 TestFujitsuProx pvestatd[1253]: starting server
Mar 15 12:30:36 TestFujitsuProx systemd[1]: Started PVE Status Daemon.
Mar 15 12:30:36 TestFujitsuProx kernel: ip_set: protocol 6
Mar 15 12:30:36 TestFujitsuProx systemd[1]: Started PVE activate Ceph OSD disks.
Mar 15 12:30:36 TestFujitsuProx systemd[1]: Starting ceph target allowing to start/stop all ceph*@.service instances at once.
Mar 15 12:30:36 TestFujitsuProx systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.
Mar 15 12:30:36 TestFujitsuProx systemd[1]: Starting ceph target allowing to start/stop all ceph-mon@.service instances at once.
Mar 15 12:30:36 TestFujitsuProx systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.
Mar 15 12:30:37 TestFujitsuProx ceph-mon[1188]: starting mon.1 rank 1 at 192.168.100.212:6789/0 mon_data /var/lib/ceph/mon/ceph-1 fsid 5ca37427-2f48-4981-8bc4-190b7e851b99
Mar 15 12:30:37 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 25 26 27 28 2b 2c 2d 2f 30 31 32 33 34 3b 3c
Mar 15 12:30:37 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 27 28 2b 2c 2d 3b 3c 56 57 59 5a 5c 5d 5e
Mar 15 12:30:37 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 3c 56 57 59 5a 5c 5d 5e 7a 7b 7c 7d 7e
Mar 15 12:30:37 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 5d 5e 7a 7b 7c 7d 99 9c 9e 9f a0 a1 a3 a5 a6 a7 aa ab af b0
Mar 15 12:30:37 TestFujitsuProx pmxcfs[1119]: [dcdb] notice: update complete - trying to commit (got 3 inode updates)
Mar 15 12:30:37 TestFujitsuProx ceph-osd-prestart.sh[1189]: 2017-03-15 12:30:37.396546 7fb38596f700 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-1/keyring: (2) No such file or direct
Mar 15 12:30:37 TestFujitsuProx ceph-osd-prestart.sh[1189]: 2017-03-15 12:30:37.396567 7fb38596f700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
Mar 15 12:30:37 TestFujitsuProx ceph-osd-prestart.sh[1189]: 2017-03-15 12:30:37.396570 7fb38596f700  0 librados: osd.1 initialization error (2) No such file or directory
Mar 15 12:30:37 TestFujitsuProx pmxcfs[1119]: [dcdb] notice: all data is up to date
Mar 15 12:30:37 TestFujitsuProx ceph-osd-prestart.sh[1189]: Error connecting to cluster: ObjectNotFound
Mar 15 12:30:37 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 9e a0 a1 a5 a6 a7 aa ba be bf c0 c1 c2 c3 c4 c5 c7 c8 c9 ca cb cc cd ce cf d0 d1 d2 d3 d4
Mar 15 12:30:37 TestFujitsuProx systemd[1]: Started Ceph object storage daemon.
Mar 15 12:30:37 TestFujitsuProx systemd[1]: Starting ceph target allowing to start/stop all ceph-osd@.service instances at once.
Mar 15 12:30:37 TestFujitsuProx systemd[1]: Reached target ceph target allowing to start/stop all ceph-osd@.service instances at once.
Mar 15 12:30:37 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: be bf c0 c1 c3 c4 c8 c9 ca cb cc cd ce d4
Mar 15 12:30:38 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: cb cc cd ce e4 e5 e6 e8 e9 ea eb
Mar 15 12:30:38 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: e9 ea eb fd fe
Mar 15 12:30:38 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 110 112 113 115 118 119 11a 11b 11c 11d 11e 11f 120 121
Mar 15 12:30:38 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 11d 11f 120 121 126 127 128 12a 12c 12d
Mar 15 12:30:38 TestFujitsuProx pmxcfs[1119]: [status] notice: received all states
Mar 15 12:30:38 TestFujitsuProx pmxcfs[1119]: [status] notice: all data is up to date
Mar 15 12:30:38 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 12d 131 132 133
Mar 15 12:30:38 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 135 138 139
Mar 15 12:30:38 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 140 142 143 144 146
Mar 15 12:30:38 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 149 14a
Mar 15 12:30:38 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 14e 152
Mar 15 12:30:38 TestFujitsuProx corosync[1206]: [TOTEM ] Retransmit List: 159
Mar 15 12:30:39 TestFujitsuProx ceph-osd[1272]: 2017-03-15 12:30:39.350501 7fdcbcc4b800 -1  ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or directory
Mar 15 12:30:39 TestFujitsuProx systemd[1]: ceph-osd@1.service: main process exited, code=exited, status=1/FAILURE
Mar 15 12:30:39 TestFujitsuProx systemd[1]: Unit ceph-osd@1.service entered failed state.
I had searched forums and found only similar situations, but no clue for mine. I have really no keyring file in that folder, and wonder if it was generated at all.
Code:
ceph auth list
gives me some keyrings for OSDs, but I can't find files where they are.
My file "/etc/pve/ceph.conf" contains path to that non-existent keyring:
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 192.168.100.0/24
filestore xattr use omap = true
fsid = 5ca37427-2f48-4981-8bc4-190b7e851b99
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 192.168.100.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.2]
host = FujitsuNode5ProxTest
mon addr = 192.168.100.211:6789

[mon.0]
host = LocalNode3
mon addr = 192.168.100.216:6789

[mon.1]
host = TestFujitsuProx
mon addr = 192.168.100.212:6789

If somebody knows how to find where pveceph utility have to generate keyrings for OSDs, please tell me.
 
Hi,

is the osd mounted and are the permission on the osd set correct?
 
No, it's not! I found that my virtual loop device partitions does not appear in the "/dev/" folder after pveceph made partitions on it... Only the loop device appears... I think I have to search how to recognize that partitions on boot. Thank you!
 
The solution was simple... I had just to add the key "--partscan" to the losetup command when connecting virtual disk on boot. It being load as a service at "/etc/systemd/system" folder:
Code:
[Unit]
Description=Setup loop device /dev/loop0 after filesystems are mounted
Requires=mnt-virtdrive.mount
After=mnt-virtdrive.mount

[Service]
ExecStart=/sbin/losetup --partscan "/dev/loop0" "/mnt/virtdrive/virtdisk1"

[Install]
WantedBy=mnt-virtdrive.mount
Here "mnt-virtdrive.mount" is name of preceding service, which mounts my real hard drive partitions via "fstab" file.
It works as expected so far...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!