osd move issue

In PVE5 with Luminous and prior, the OSDs where partitions and not on LVM. That made the move easier, but could cause issues with udev, as it relied on it to find the proper partitions for the OSD.
 
Hello everybody.
I wanted to share instructions on moving the OSD to another node.
This instruction is based on the recommendations in this topic and was tested by me first in a test environment, then in a working environment.
Ceph version 14.2.9. Proxmox v.6.2-6 is deployed on all nodes, and Ceph Nautilus is already on top, so some actions were performed in the Proxmox web interface. Whoever has a clean OS with Ceph packages on the nodes, please find the appropriate CLI commands.

1-a. in web interface Node->Disks: look for which /dev/sdX belong to which OSD.Y
1-b. in web interface Node->Disks->LVM: look for which VG name (ceph-xxxxxxxxx...) belong to which /dev/sdX
1-c. copy the contents of /var/lib/ceph/osd/ceph-Y, later the content of the fsid file will be needed

1. ceph-volume lvm list will show all needed information

2-a. set flag noout
2-b. set OSD.Y out
2-c. stop OSD.Y service

3-a. in CLI: lvdisplay, in the output, look for <LV Path> for the desired VG name (ceph-xxxxxxxxx...)
3-b. deactivate LVM in CLI: lvchange -an <LV Path>
3-c. export the VG in CLI: vgexport <VG name>

4-a. remove disk from server
4-b. input disk into other server

5-a. run pvscan to see if the disk is seen by LVM
5-b. import the VG in CLI: vgimport <VG name>
5-c. activate LVM in CLI: lvchange -ay <LV Path>
5-d. then activate the single osd in CLI: ceph-volume lvm activate <ID> <osd fsid>
<ID> - only a number - OSD number; <osd fsid> - we take from the file, see 1-c.

6-a. set OSD.Y in
6-b. remove flag noout
 
Last edited:
OK I followed the steps, and wads not ablt to move the drive.

the 1st error cam after this:
Code:
# vgimport ceph-da1b7ac2-64fc-47e0-8c21-3ba9507da14c
  Volume group "ceph-da1b7ac2-64fc-47e0-8c21-3ba9507da14c" successfully imported

then error here:
Code:
# ceph-volume lvm activate --all
--> OSD ID 3 FSID fdcc37da-c93e-4161-a4c3-45e82f695292 process is active. Skipping activation
--> Activating OSD ID 16 FSID 10cb7d13-893d-45d4-a711-0bd0a76194e6
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-16
--> Absolute path not found for executable: restorecon
--> Ensure $PATH environment variable contains common executable locations
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-16
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-da1b7ac2-64fc-47e0-8c21-3ba9507da14c/osd-block-10cb7d13-893d-45d4-a711-0bd0a76194e6 --path /var/lib/ceph/osd/ceph-16 --no-mon-config
stderr: 2019-08-30 14:22:14.934 7f446e26e140 -1 bluestore(/dev/ceph-da1b7ac2-64fc-47e0-8c21-3ba9507da14c/osd-block-10cb7d13-893d-45d4-a711-0bd0a76194e6) _read_bdev_label failed to open /dev/ceph-da1b7ac2-64fc-47e0-8c21-3ba9507da14c/osd-block-10cb7d13-893d-45d4-a711-0bd0a76194e6: (2) No such file or directory
failed to read label for /dev/ceph-da1b7ac2-64fc-47e0-8c21-3ba9507da14c/osd-block-10cb7d13-893d-45d4-a711-0bd0a76194e6: (2) No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1

At pve > ceph > osd : The osd still shows at the original node as down and out.

I tried this process yesterday with my homelab and was getting the same error about "failed to read label for /dev/ceph-.../osd-block-..."

I did a little poking around and discovered that it was a permissions issue with device mapper for LVM.

Code:
root@epyc3251:~# ls -al /dev/ceph-37ac4e46-a222-48bb-88e8-755b2e61f0ca/osd-block-e723bb40-3a13-4139-8906-e89a7ec90dcd
lrwxrwxrwx 1 root root 7 Dec 13 14:19 /dev/ceph-37ac4e46-a222-48bb-88e8-755b2e61f0ca/osd-block-e723bb40-3a13-4139-8906-e89a7ec90dcd -> ../dm-1

root@epyc3251:~# ls -al /dev/dm-1
brw-rw---- 1 root disk 253, 1 Dec 14 16:43 /dev/dm-1

root@epyc3251:~# chown ceph:ceph /dev/dm-1

Making "ceph" the owner of the dm device that the LV was symlinked to fixed my problems and the "ceph-volume lvm activate --all" worked as advertised.

-------------------------

Kernel Version: Linux 5.4.78-2-pve #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100)
PVE Manager Version: pve-manager/6.3-3/eee5f901
Ceph Version: 14.2.15
 
I forgot to add the export/import steps:
  1. set OSD out
  2. stop OSD service
  3. deactivate LVM (if OSD made with ceph-volume) / unmount OSD partition
  4. export the VG (vgexport <VG-ID>)
  5. remove disk from server
  6. input disk into other server
  7. run pvscan to see if the disk is seen by LVM
  8. import the VG (vgimport <VG-ID>)
  9. then activate the single osd ceph-volume lvm activate <ID> <osd fsid>
  10. last but not least, ceph osd in <ID>
could you please tell me how to determinde the VG-ID and osd fsid?
 
I got the solution:

1. Stop / Out the OSD you want to move
2. Physically move OSD to new node
3. In Proxmox on new node > Open a shell and type:

Bash:
pvscan
ceph-volume lvm activate --all

4. Have some patience, it can take at least 30 seconds - after some time, chech Ceph > OSD (reload) and you should see the OSD's start to appear on the new node!
 
I ran into the same issue as @RobFantini, looks like there's a missing step in:
I forgot to add the export/import steps:
  1. set OSD out
  2. stop OSD service
  3. deactivate LVM (if OSD made with ceph-volume) / unmount OSD partition
  4. export the VG (vgexport <VG-ID>)
  5. remove disk from server
  6. input disk into other server
  7. run pvscan to see if the disk is seen by LVM
  8. import the VG (vgimport <VG-ID>)
The missing Step in between these is a
Code:
vgchange -a y
without that the volume groups don't seem to be available to activate with ceph-volume.
  1. then activate the single osd ceph-volume lvm activate <ID> <osd fsid>
  2. last but not least, ceph osd in <ID>
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!