Cannot start OSD after jackin and jackout of same hdd

Jayesh

Member
Sep 4, 2020
11
0
6
37
Hi,

pveversion
pve-manager/6.2-12/b287dd27 (running kernel: 5.4.65-1-pve)

We have a 3 node pve cluster with ceph

Recently on on of the nodes (on pxmx3) we did a jackout and jackin of the same hdd

Before jackout the drive letter was /dev/sdg

After jack out the drive letter becomes /dev/sdi

Due to this our osd.16 on host pxmx3 went down (down and out). We tried starting osd.16 from proxmox GUI but no luck.

We destroyed the osd.16 from GUI but we are still able to see osd.16 when we run ceph-volume command

When we checked the status of the destroy command of osd.16 we found

destroy OSD osd.16
Remove osd.16 from the CRUSH map
Remove the osd.16 authentication key.
Remove OSD osd.16
--> Zapping: /dev/ceph-78396b57-965e-497b-9c03-49e5a4747435/osd-block-12445079-ad19-4157-b7e5-f3cbb4ca71f9
--> Unmounting /var/lib/ceph/osd/ceph-16
Running command: /bin/umount -v /var/lib/ceph/osd/ceph-16
stderr: umount: /var/lib/ceph/osd/ceph-16 unmounted
Running command: /bin/dd if=/dev/zero of=/dev/ceph-78396b57-965e-497b-9c03-49e5a4747435/osd-block-12445079-ad19-4157-b7e5-f3cbb4ca71f9 bs=1M count=10 conv=fsync
stderr: /bin/dd: fsync failed for '/dev/ceph-78396b57-965e-497b-9c03-49e5a4747435/osd-block-12445079-ad19-4157-b7e5-f3cbb4ca71f9': Input/output error
stderr: 10+0 records in
10+0 records out
stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0150309 s, 698 MB/s
--> RuntimeError: command returned non-zero exit status: 1
command '/usr/sbin/ceph-volume lvm zap --osd-id 16 --destroy' failed: exit code 1
command '/sbin/pvremove /dev/sdi' failed: Insecure dependency in exec while running with -T switch at /usr/share/perl/5.28/IPC/Open3.pm line 178.
TASK OK

We tried to create new OSD from gui but it shows that "No Disk Unused"
When we run the command pveceph osd create give error "already in use".

Current Ceph OSD status:1649413570862.png

ceph-volume lvm list:

someuser@pxmx3:~$ sudo ceph-volume lvm list
[sudo] password for someuser:


====== osd.12 ======

[block] /dev/ceph-45ec37f5-20cf-43ec-9972-774efaac5fdd/osd-block-07e247d6-3432-4aef-b882-ceed3422a1dd

block device /dev/ceph-45ec37f5-20cf-43ec-9972-774efaac5fdd/osd-block-07e247d6-3432-4aef-b882-ceed3422a1dd
block uuid t4j0ea-9eXx-npcI-t0y6-JdxJ-Arkq-DzYKLI
cephx lockbox secret
cluster fsid 57ba4b78-40ba-41f6-848d-16b51ab7447f
cluster name ceph
crush device class None
encrypted 0
osd fsid 07e247d6-3432-4aef-b882-ceed3422a1dd
osd id 12
osdspec affinity
type block
vdo 0
devices /dev/sdb

====== osd.13 ======

[block] /dev/ceph-f36b0660-b413-48d9-a286-a871ce28f11c/osd-block-51d9c77f-7d66-497e-b343-7af05de6b9cd

block device /dev/ceph-f36b0660-b413-48d9-a286-a871ce28f11c/osd-block-51d9c77f-7d66-497e-b343-7af05de6b9cd
block uuid P3oC03-WRJT-2EMf-Tue6-xWi2-htR6-ueu2ZI
cephx lockbox secret
cluster fsid 57ba4b78-40ba-41f6-848d-16b51ab7447f
cluster name ceph
crush device class None
encrypted 0
osd fsid 51d9c77f-7d66-497e-b343-7af05de6b9cd
osd id 13
osdspec affinity
type block
vdo 0
devices /dev/sdc

====== osd.14 ======

[block] /dev/ceph-c069b553-6114-40ef-98b8-fcb2fdd7b33b/osd-block-6d2d0a5b-ecdf-4107-afbb-7e8c78a3b4d6

block device /dev/ceph-c069b553-6114-40ef-98b8-fcb2fdd7b33b/osd-block-6d2d0a5b-ecdf-4107-afbb-7e8c78a3b4d6
block uuid lFd782-dRVn-G5hM-V5eJ-DO40-6wiY-3Re2ZO
cephx lockbox secret
cluster fsid 57ba4b78-40ba-41f6-848d-16b51ab7447f
cluster name ceph
crush device class None
encrypted 0
osd fsid 6d2d0a5b-ecdf-4107-afbb-7e8c78a3b4d6
osd id 14
osdspec affinity
type block
vdo 0
devices /dev/sde

====== osd.16 ======

[block] /dev/ceph-78396b57-965e-497b-9c03-49e5a4747435/osd-block-12445079-ad19-4157-b7e5-f3cbb4ca71f9

block device /dev/ceph-78396b57-965e-497b-9c03-49e5a4747435/osd-block-12445079-ad19-4157-b7e5-f3cbb4ca71f9
block uuid WXP0D7-bj7r-MhJY-a7Jh-TQ08-pdEK-PNxpRY
cephx lockbox secret
cluster fsid 57ba4b78-40ba-41f6-848d-16b51ab7447f
cluster name ceph
crush device class None
encrypted 0
osd fsid 12445079-ad19-4157-b7e5-f3cbb4ca71f9
osd id 16
osdspec affinity
type block
vdo 0
devices /dev/sdi

====== osd.17 ======

[block] /dev/ceph-ba5d669b-2ccb-4866-bd9a-a381a59f708e/osd-block-24f8a30c-7b8d-4333-b11f-cef677474ae8

block device /dev/ceph-ba5d669b-2ccb-4866-bd9a-a381a59f708e/osd-block-24f8a30c-7b8d-4333-b11f-cef677474ae8
block uuid HUaw80-YmSs-qRUn-OYYE-5loR-fK4s-qHasb2
cephx lockbox secret
cluster fsid 57ba4b78-40ba-41f6-848d-16b51ab7447f
cluster name ceph
crush device class None
encrypted 0
osd fsid 24f8a30c-7b8d-4333-b11f-cef677474ae8
osd id 17
osdspec affinity
type block
vdo 0
devices /dev/sdh

lsblk:

1649414025607.png

Please help to understand and solve this issue.

Thanks,
Jayesh
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!