Ceph OSD issue

TwiX

Renowned Member
Feb 3, 2015
310
22
83
Hi,

I'm upgrading a 5 nodes cluster from pve 6.2.6 to 6.2.11

Everything was ok until I reboot the third one .

2 osds didn't start with message :

osd 8 and 10
unable to obtain rotating service keys; retrying

reboot again and then osd 8 and 10 are ok but osd 11 and 9 are ko with same messages

Didn't find something interesting on google about these messages. So I decided to remove OSD 9.
via GUI :
put osd 9 in out state
stop osd 9 service
remove osd 9

recreate osd but doesn't appear in the osd tab. It was marked as an obsolete osd (weird).
So via command line:

ceph osd out 9
=> OK
systemctl stop ceph-osd@9
=> OK
ceph osd crush remove osd.9
=> Warning the osd.9 is not present in the crush map
ceph auth del osd.9
=> OK
ceph osd rm 9
=> OK

reboot the node

I stopped after that, because it seems that removing an osd via GUI let some services start on boot (services related to osd9, for example lvm volumes)
Running command: /usr/sbin/ceph-volume lvm trigger 9-7d980d55-34bf-456b-9f68-839585aba395

what is the process to cleanup everything and recreate an osd ?

Thanks in advance !

Antoine
 
I still also have a tmpfs for the deleted osd

root@dc-prox-11:~# df -h | grep tmpfs
tmpfs 13G 11M 13G 1% /run
tmpfs 63G 63M 63G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 63G 0 63G 0% /sys/fs/cgroup
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-8
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-11
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-9
tmpfs 63G 24K 63G 1% /var/lib/ceph/osd/ceph-10
tmpfs 13G 0 13G 0% /run/user/0

is it normal ?
 
Running command: /usr/sbin/ceph-volume lvm trigger 9-7d980d55-34bf-456b-9f68-839585aba395
Leftover LV? ceph-volume runs through all VGs and tries to activate the OSDs.

what is the process to cleanup everything and recreate an osd ?
See the link, you may want to specify --cleanup on destroy to remove the LV as well.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_ceph_maintenance

I still also have a tmpfs for the deleted osd
Leftover from the manual cleanup.
 
thanks !

I recreated the osd.9 seems to be ok

I still have these messages on the syslog file :

Running command: /usr/sbin/ceph-volume lvm trigger 9-7d980d55-34bf-456b-9f68-839585aba395
Running command: /usr/sbin/ceph-volume lvm trigger 9-9501cd08-984b-4ee9-aafe-c58ea0402dc4

theses volumes doesn't exist anymore....
how to get rid of ?

Antoine
 
theses volumes doesn't exist anymore....
Either a reboot is missing, kind of 'recently removed'. Or in /etc/ceph/ is a leftover json file, when OSDs where converted from ceph-disk to ceph-volume.
 
Running command: /usr/sbin/ceph-volume lvm trigger 9-7d980d55-34bf-456b-9f68-839585aba395
Running command: /usr/sbin/ceph-volume lvm trigger 9-9501cd08-984b-4ee9-aafe-c58ea0402dc4
Then these may be from disks in the system that haven't been zapped yet?
 
Hi,

I always run ceph-volume lvm zap /dev/sdX --destroy before creating a new osd
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!