[SOLVED] ceph nautilus move luminous created osd to another node.

can you please post the 'ceph status' output ?
and please write point for point what you did and what did not work...
 
here is status, i'll write up a complete report of the few hours i spent on this later on.
Code:
# ceph status
  cluster:
    id:     220b9a53-4556-48e3-a73c-28deff665e45
    health: HEALTH_OK
  services:
    mon: 3 daemons, quorum pve3,pve10,pve14 (age 18h)
    mgr: pve10(active, since 9d), standbys: pve3, pve14, sys8
    osd: 71 osds: 71 up (since 17h), 71 in (since 17h)
  data:
    pools:   4 pools, 2560 pgs
    objects: 1.58M objects, 5.8 TiB
    usage:   12 TiB used, 25 TiB / 38 TiB avail
    pgs:     2560 active+clean
  io:
    client:   694 KiB/s rd, 5.0 MiB/s wr, 57 op/s rd, 622 op/s wr
 
this is what i did that did not work:
Code:
on node that has the osd's: at pve web page: for each osd pressed stop and out
shutdown the node that had the osd's
remove the osd's

restart the node that had the osd's.

put an osd to another node.
restart the other node.
that did not work.

at pve ceph > osd the osd's still show up as down and out at the original node.

this is what i did that seemed to work:
Code:
   == source node ==
   systemctl  stop  ceph-osd@XX
   ceph osd out XX
   umount /var/lib/ceph/osd/ceph-XX
   ceph osd purge XX   --yes-i-really-mean-it
  
   == target node ==
   move to new node 
    # use dmesg to get letter.
   ceph-volume lvm zap /dev/sdX
   ceph-volume lvm create  --data /dev/sdX
 
this is what i did that did not work:
was this an osd created with luminous (and 'old' one) ?
if yes, did you read the upgrade guide? https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus

it states there :

Note that ceph-volume does not have the same hot-plug capability like ceph-disk had, where a newly attached disk is automatically detected via udev events.

You will need to scan the main data partition for each ceph-disk OSD explicitly, if

  • the OSD isn’t currently running when the above scan command is run,
  • a ceph-disk-based OSD is moved to a new host,
  • the host OSD is reinstalled,
  • or the /etc/ceph/osd directory is lost.
 
was this an osd created with luminous (and 'old' one) ?
if yes, did you read the upgrade guide? https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus

it states there :

the osd's that I tried to move were created with luminous



I read that a few times during the upgrade process, and did do these on every node + a reboot:
Code:
ceph-volume simple scan
ceph-volume simple activate --all

I tried to closely follow the instructions during the upgrade and did not have an issue for a few weeks until i attemted moving the osd's. Now I could have done something wrong....


Question: what should the process be for moving luminous created osd's to another node?
 
I assume that from pve : stop and out the osd. then physically move it.

how would I do this:
You will need to scan the main data partition for each ceph-disk OSD explicitly.

like what is the command to run from cli? ok i read the man page
Code:
ceph-volume simple scan  /dev/sdk1

so I'll try moving another osd later on today. use pve web and one cli command..
 
Last edited:
I just did a test move.

ceph -s shows all OK,
pve > ceph > osd shows an issue
pve > ceph shows OK

details:
Test:
pick a luminous created OSD.

stop and out the osd. then physically move it. OK

run scan: ceph-volume simple scan /dev/sdX1 <<

scan:
Code:
# ceph-volume simple scan /dev/sdo1
Running command: /sbin/cryptsetup status /dev/sdo1
Running command: /bin/mount -v /dev/sdo1 /tmp/tmpia0XHr
 stdout: mount: /dev/sdo1 mounted on /tmp/tmpia0XHr.
Running command: /bin/umount -v /tmp/tmpia0XHr
 stderr: umount: /tmp/tmpia0XHr unmounted
--> OSD 30 got scanned and metadata persisted to file: /etc/ceph/osd/30-11930b97-2c45-490c-a722-4e034a0c5433.json
--> To take over management of this scanned OSD, and disable ceph-disk and udev, run:
-->     ceph-volume simple activate 30 11930b97-2c45-490c-a722-4e034a0c5433

activate: Note had to mkdir for it to work.
Code:
pve14  ~ # mkdir /var/lib/ceph/osd/ceph-30
pve14  ~ # 
pve14  ~ # ceph-volume simple activate 30 11930b97-2c45-490c-a722-4e034a0c5433
Running command: /bin/mount -v /dev/sdo1 /var/lib/ceph/osd/ceph-30
 stdout: mount: /dev/sdo1 mounted on /var/lib/ceph/osd/ceph-30.
Running command: /bin/ln -snf /dev/sdb2 /var/lib/ceph/osd/ceph-30/block
Running command: /bin/chown -R ceph:ceph /dev/sdb2
Running command: /bin/systemctl enable ceph-volume@simple-30-11930b97-2c45-490c-a722-4e034a0c5433
 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@simple-30-11930b97-2c45-490c-a722-4e034a0c5433.service → /lib/systemd/system/ceph-volume@.service.
Running command: /bin/ln -sf /dev/null /etc/systemd/system/ceph-disk@.service
--> All ceph-disk systemd units have been disabled to prevent OSDs getting triggered by UDEV events
Running command: /bin/systemctl enable --runtime ceph-osd@30
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@30.service → /lib/systemd/system/ceph-osd@.service.
Running command: /bin/systemctl start ceph-osd@30
--> Successfully activated OSD 30 with FSID 11930b97-2c45-490c-a722-4e034a0c5433

status:
Code:
# ceph -s
  cluster:
    id:     220b9a53-4556-48e3-a73c-28deff665e45
    health: HEALTH_OK
  services:
    mon: 3 daemons, quorum pve3,pve10,pve14 (age 22h)
    mgr: pve10(active, since 9d), standbys: pve3, pve14, sys8
    osd: 71 osds: 70 up (since 8m), 70 in (since 8m)
  data:
    pools:   4 pools, 2560 pgs
    objects: 1.58M objects, 5.8 TiB
    usage:   13 TiB used, 25 TiB / 37 TiB avail
    pgs:     2560 active+clean
  io:
    client:   18 KiB/s rd, 2.0 MiB/s wr, 0 op/s rd, 150 op/s wr

* pve > ceph > osd - see attached pic.


So drives can be moved. I was missing the scan, and pve > veph > osd may have an issue.

thank you for the help!
 

Attachments

  • pve14   Proxmox Virtual Environment.png
    pve14 Proxmox Virtual Environment.png
    191.3 KB · Views: 16

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!