Ceph: hot refitting a disk


Apr 29, 2020
Hi all,

we needed to replace a drive caddy (long story) for a running drive, on a proxmox cluster running ceph (15.2.17). THe drives themselves are hot swappable. First i stopped the OSD, pulled out the drive, changed the caddy, refitted the (same) drive. The drive quickly showed up in proxmox (although under a different /dev/sdx device) and it displayed the correct OSD number as well. When I tried to start the OSD, the command failed, and the logs showed this:

2023-09-20T14:17:18.284+0200 7fe2d0232d80 0 set uid:gid to 64045:64045 (ceph:ceph)
2023-09-20T14:17:18.284+0200 7fe2d0232d80 0 ceph version 15.2.17 (542df8d06ef24dbddcf4994db16bcc4c89c9ec2d) octopus (stable), process ceph-osd, pid 2152557
2023-09-20T14:17:18.284+0200 7fe2d0232d80 0 pidfile_write: ignore empty --pid-file
2023-09-20T14:17:18.284+0200 7fe2d0232d80 -1 bluestore(/var/lib/ceph/osd/ceph-13/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-13/block: (5) Input/output error
2023-09-20T14:17:18.284+0200 7fe2d0232d80 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-13: (2) No such file or directory

I believe a reboot would have helped, but since it is a production system, I could not do so. I just destroyed and recreated the OSD.

My question: is there a way to fix this problem, without a reboot of the whole server?
I guess you have not removed the dangling loigcal volume that is present on the drive after removing it?
Did you run an lvscan after inserting the drive again?

/var/lib/ceph/osd/ceph-13/block may have pointed to the wrong block device.

No I did not, but what would the full command be? Because won't it block on the currently linked block device?


