Ceph OSD issue

TwiX

Renowned Member
Feb 3, 2015
320
23
83
Hi,

I'm upgrading a 5 nodes cluster from pve 6.2.6 to 6.2.11

Everything was ok until I reboot the third one .

2 osds didn't start with message :

osd 8 and 10
unable to obtain rotating service keys; retrying

reboot again and then osd 8 and 10 are ok but osd 11 and 9 are ko with same messages

Didn't find something interesting on google about these messages. So I decided to remove OSD 9.
via GUI :
put osd 9 in out state
stop osd 9 service
remove osd 9

recreate osd but doesn't appear in the osd tab. It was marked as an obsolete osd (weird).
So via command line:

ceph osd out 9
=> OK
systemctl stop ceph-osd@9
=> OK
ceph osd crush remove osd.9
=> Warning the osd.9 is not present in the crush map
ceph auth del osd.9
=> OK
ceph osd rm 9
=> OK

reboot the node

I stopped after that, because it seems that removing an osd via GUI let some services start on boot (services related to osd9, for example lvm volumes)
Running command: /usr/sbin/ceph-volume lvm trigger 9-7d980d55-34bf-456b-9f68-839585aba395

what is the process to cleanup everything and recreate an osd ?

Thanks in advance !

Antoine
 
I still also have a tmpfs for the deleted osd

root@dc-prox-11:~# df -h | grep tmpfs
tmpfs 13G 11M 13G 1% /run
tmpfs 63G 63M 63G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 63G 0 63G 0% /sys/fs/cgroup
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-8
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-11
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-9
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-10
tmpfs 13G 0 13G 0% /run/user/0

is it normal ?
 
Running command: /usr/sbin/ceph-volume lvm trigger 9-7d980d55-34bf-456b-9f68-839585aba395
Leftover LV? ceph-volume runs through all VGs and tries to activate the OSDs.

what is the process to cleanup everything and recreate an osd ?
See the link, you may want to specify --cleanup on destroy to remove the LV as well.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_ceph_maintenance

I still also have a tmpfs for the deleted osd
Leftover from the manual cleanup.
 
thanks !

I recreated the osd.9 seems to be ok

I still have these messages on the syslog file :

Running command: /usr/sbin/ceph-volume lvm trigger 9-7d980d55-34bf-456b-9f68-839585aba395
Running command: /usr/sbin/ceph-volume lvm trigger 9-9501cd08-984b-4ee9-aafe-c58ea0402dc4

theses volumes doesn't exist anymore....
how to get rid of ?

Antoine
 
theses volumes doesn't exist anymore....
Either a reboot is missing, kind of 'recently removed'. Or in /etc/ceph/ is a leftover json file, when OSDs where converted from ceph-disk to ceph-volume.
 
Running command: /usr/sbin/ceph-volume lvm trigger 9-7d980d55-34bf-456b-9f68-839585aba395
Running command: /usr/sbin/ceph-volume lvm trigger 9-9501cd08-984b-4ee9-aafe-c58ea0402dc4
Then these may be from disks in the system that haven't been zapped yet?
 
Hi,

I always run ceph-volume lvm zap /dev/sdX --destroy before creating a new osd