Starting osd after unplugging and plugging the drive without rebooting the server

labynko

New Member
May 21, 2021
2
3
1
44
Hello.
If I accidentally remove the drive from the server and then plug it back in, what is the procedure for osd to work correctly without rebooting the server?
 
  • Like
Reactions: Tmanok
I managed to find a solution:

1. Determine which OSD failed after removing the disk (the status shows that the failure occurred with osd.2):
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.09357 root default
-7 0.03119 host proxmox1
4 ssd 0.01559 osd.4 up 1.00000 1.00000
5 ssd 0.01559 osd.5 up 1.00000 1.00000
-5 0.03119 host proxmox2
2 ssd 0.01559 osd.2 down 1.00000 1.00000
3 ssd 0.01559 osd.3 up 1.00000 1.00000
-3 0.03119 host proxmox3
0 ssd 0.01559 osd.0 up 1.00000 1.00000
1 ssd 0.01559 osd.1 up 1.00000 1.00000

2. On the proxmox2 host, get a list of used LVM volumes for OSD (find the used device name for osd.2):
ceph-volume lvm list
====== osd.2 =======

[block] /dev/ceph-e6316ef4-99bf-40d7-ad9d-673bd450ed97/osd-block-c8e1aae8-707b-416f-b392-c6786ddd9a7a

block device /dev/ceph-e6316ef4-99bf-40d7-ad9d-673bd450ed97/osd-block-c8e1aae8-707b-416f-b392-c6786ddd9a7a
block uuid mwh3Wx-ayW8-iv9u-UGNM-lMZ2-E8Gp-1dgsh6
cephx lockbox secret
cluster fsid fd69522b-8c80-4efe-b900-f2d8c7c00e43
cluster name ceph
crush device class None
encrypted 0
osd fsid c8e1aae8-707b-416f-b392-c6786ddd9a7a
osd id 2
osdspec affinity
type block
vdo 0
devices /dev/sdb

====== osd.3 =======

[block] /dev/ceph-59a2d958-3256-4d6c-94fe-b7bc71f227fa/osd-block-93ad57d6-e882-4e54-84b1-a16ba7b56629

block device /dev/ceph-59a2d958-3256-4d6c-94fe-b7bc71f227fa/osd-block-93ad57d6-e882-4e54-84b1-a16ba7b56629
block uuid cItkqn-kLfZ-mLVs-XZQ2-Zd4e-RVsX-NstKsB
cephx lockbox secret
cluster fsid fd69522b-8c80-4efe-b900-f2d8c7c00e43
cluster name ceph
crush device class None
encrypted 0
osd fsid 93ad57d6-e882-4e54-84b1-a16ba7b56629
osd id 3
osdspec affinity
type block
vdo 0
devices /dev/sdc

3. Deactivate osd.2 volume:
lvm lvchange -a n /dev/ceph-e6316ef4-99bf-40d7-ad9d-673bd450ed97/osd-block-c8e1aae8-707b-416f-b392-c6786ddd9a7a

4. Activate osd.2 volume:
lvm lvchange -a y /dev/ceph-e6316ef4-99bf-40d7-ad9d-673bd450ed97/osd-block-c8e1aae8-707b-416f-b392-c6786ddd9a7a

5. Launch osd.2 with the "osd id" and "osd fsid" values specified in the command, which were obtained in the 2nd step:
ceph-volume lvm activate 2 c8e1aae8-707b-416f-b392-c6786ddd9a7a
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-e6316ef4-99bf-40d7-ad9d-673bd450ed97/osd-block-c8e1aae8-707b-416f-b392-c6786ddd9a7a --path /var/lib/ceph/osd/ceph-2 --no-mon-config
Running command: /usr/bin/ln -snf /dev/ceph-e6316ef4-99bf-40d7-ad9d-673bd450ed97/osd-block-c8e1aae8-707b-416f-b392-c6786ddd9a7a /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/systemctl enable ceph-volume@lvm-2-c8e1aae8-707b-416f-b392-c6786ddd9a7a
Running command: /usr/bin/systemctl enable --runtime ceph-osd@2
Running command: /usr/bin/systemctl start ceph-osd@2
--> ceph-volume lvm activate successful for osd ID: 2

6. Check that all OSDs are working:
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.09357 root default
-7 0.03119 host proxmox1
4 ssd 0.01559 osd.4 up 1.00000 1.00000
5 ssd 0.01559 osd.5 up 1.00000 1.00000
-5 0.03119 host proxmox2
2 ssd 0.01559 osd.2 up 1.00000 1.00000
3 ssd 0.01559 osd.3 up 1.00000 1.00000
-3 0.03119 host proxmox3
0 ssd 0.01559 osd.0 up 1.00000 1.00000
1 ssd 0.01559 osd.1 up 1.00000 1.00000
 
Last edited:
Labynko!

Thank you so much! This helped me restore two mislabeled OSDs! Wrong serial printed on the caddy/tray and I only noticed about 2 minutes later! Safe to say that if you can run these commands within 5 minutes you won't crash your OSD. I crashed one after about the 5-8 minute mark.

Tmanok
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!