move osd to another node

RobFantini

Famous Member
May 24, 2012
2,023
107
133
Boston,Mass
I've a few more osd's to move to other nodes.

I've been wiping them by using stop , out , remove [and deleting partitions]. Then adding from scratch on other node.


Is there a standard way to move an osd while preserving data using pve ?
 
if you don't have a separate (shared) journal device, you can just down and out the OSD and physically move it from one host to another. (hot-)plugging it in should automatically start the OSD service on the new host, and you can mark the osd as "in" on the GUI.

if you have a separate journal device only used for the particular osd, you can move it together with the OSD and it should work as well.

if you have a shared journal device, you might be able to move all the OSDs using it and the journal, all at the same time - I have never tried this though.

if you only want to move one of several OSDs sharing a journal, this is AFAIK not possible, and you need to actually remove it and create a new OSD from scratch on the new host, using the moved disk.
 
  • Like
Reactions: Tmanok
so I tried doing this for 2 osd.

0- set noout
for each drive:
1- Stop [ there is not a 'down' option at pve> ceph> osd]
2- Out
3- move to other system it auto mounted.
4- In.

However I was surprised to see this:
Code:
# ceph -s
    cluster 75bc38f7-d42c-449b-88ed-488c7778a551
     health HEALTH_WARN
            215 pgs backfill_wait
            4 pgs backfilling
            110 pgs degraded
            52 pgs recovery_wait
            110 pgs stuck degraded
            271 pgs stuck unclean
            55 pgs stuck undersized
            55 pgs undersized
            recovery 31685/684182 objects degraded (4.631%)
            recovery 183828/684182 objects misplaced (26.868%)
            noout flag(s) set
     monmap e23: 3 mons at {1=10.11.12.13:6789/0,2=10.11.12.12:6789/0,3=10.11.12.11:6789/0}
            election epoch 2414, quorum 0,1,2 3,2,1
     osdmap e8233: 19 osds: 19 up, 19 in; 224 remapped pgs
            flags noout,sortbitwise,require_jewel_osds
      pgmap v2068147: 1024 pgs, 2 pools, 760 GB data, 195 kobjects
            2288 GB used, 6108 GB / 8396 GB avail
            31685/684182 objects degraded (4.631%)
            183828/684182 objects misplaced (26.868%)
                 753 active+clean
                 160 active+remapped+wait_backfill
                  52 active+undersized+degraded+remapped+wait_backfill
                  47 active+recovery_wait+degraded
                   5 active+recovery_wait+degraded+remapped
                   3 active+degraded+remapped+wait_backfill
                   3 active+undersized+degraded+remapped+backfilling
                   1 active+remapped+backfilling
recovery io 747 MB/s, 192 objects/s
  client io 7239 B/s wr, 0 op/s rd, 3 op/s wr

What did I do wrong?
 
did you move the OSD to a node which already had OSDs? if so, rebalancing is to be expected (because now two of the three copies of some PGs are on the same node, and Ceph tries to distribute the data according to the CRUSH map)
 
  • Like
Reactions: Tmanok
OK rather then open a new thread here is another related question:

we need to replace a motherboard on a system that has 8 osd.

we have a spare system ready to accept the 8 OSD's .

my question:

is is better to move the drives one at a time or all at once?
 
well I tried
- set noout
- shutting down the node with the 8OSD's ,
- moving the osd's to spare system, then turned on

the above method still resulted in a high % of ceph objects misplaced
Code:
   pgs:     34850/893478 objects degraded (3.900%)
             341089/893478 objects misplaced (38.175%)

so is there a way to move all osd's to a new node and leave not much work for ceph?
 
if you don't have a separate (shared) journal device, you can just down and out the OSD and physically move it from one host to another. (hot-)plugging it in should automatically start the OSD service on the new host, and you can mark the osd as "in" on the GUI.
Hello

so soon I've got to completely replace a node that has 8 osd's .

the new system has no osd's. we will use the 8 from the old node.

We do not want to have 8 osd's getting repaired [ not sure that is the correct term] at the same time.

Is it possible to down and out all 8 , move them then set to in and not have all 8 getting repaired?
 
If you replace only the board and the OS stays the same, why not just do the 'noout' and leave the OSD where they are? After replacement of the board, the system should come up and only the changed objects would need to be synced.

If you move all OSDs to a new node, then I suppose the node needs to exist in the crush map, as I guess the OSDs will be placed on the root bucket and not under the host bucket. 'ceph osd tree' should show you the placement of the OSDs.
 
If you replace only the board and the OS stays the same, why not just do the 'noout' and leave the OSD where they are? After replacement of the board, the system should come up and only the changed objects would need to be synced.

If you move all OSDs to a new node, then I suppose the node needs to exist in the crush map, as I guess the OSDs will be placed on the root bucket and not under the host bucket. 'ceph osd tree' should show you the placement of the OSDs.

Alwin - thank you for the response. Note we decided to purchase a new chassis along with motherboard and cpu's.

Since we are just replacing the motherboard and chassis [ still using same storage ] we will try the noout option.

I'll report back in a couple of weeks how it goes.
 
Last edited:
Does this procedure still work?
I tried it but the OSDs won't 'automount' on the new host.

If I manually update the CRUSH map to move the OSDs from the old host to the new using the command:
ceph osd crush move osd.16 host=pve3
then they show up on the correct host on the OSD tree, but they cannot start.
 
  • Like
Reactions: Syrrys
I could not get the procedure to work. I had 12 osd's to try to test procedure variants .
we ended up just out/stop and destroy then move and create.

EDIT: the procedure worked on that version of pve/ceph, not current. there is a recent thread i am working on and thought this was it.
 
Last edited:
Yes, that's what I ended up doing myself too.

It would be nice if there would be a way to migrate OSDs from host to host without having to destroy and recreate them.
 
you still can - but with ceph-volume they don't automatically get added on the new host, you need to (re-)run "ceph-volume simple activate --all" or "ceph-volume lvm activate --all" after moving disks from one host to another (and it's probably a good idea to cleanup leftover systemd units on the old host unless you completely take it out of service)
 
  • Like
Reactions: Syrrys and Tmanok
Can someone post a 100% working workflow for this.

I tried today to move an OSD from one host to another, and it simply wouldn't get recognized by Ceph.
"ceph-volume lvm activate --all" was supposedly successful but osd tree would not move the OSD from the old node to the new.
Code:
# ceph-volume lvm activate --all
--> Activating OSD ID 0 FSID ad4d34b7-5f3d-4f99-bbda-da2e60ef0f04
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
--> Absolute path not found for executable: selinuxenabled
--> Ensure $PATH environment variable contains common executable locations
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-39c42c14-8d49-42aa-a887-1ae8bf01957b/osd-block-ad4d34b7-5f3d-4f99-bbda-da2e60ef0f04 --path /var/lib/ceph/osd/ceph-0 --no-mon-config
Running command: /bin/ln -snf /dev/ceph-39c42c14-8d49-42aa-a887-1ae8bf01957b/osd-block-ad4d34b7-5f3d-4f99-bbda-da2e60ef0f04 /var/lib/ceph/osd/ceph-0/block
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: /bin/systemctl enable ceph-volume@lvm-0-ad4d34b7-5f3d-4f99-bbda-da2e60ef0f04
 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-ad4d34b7-5f3d-4f99-bbda-da2e60ef0f04.service → /lib/systemd/system/ceph-volume@.service.
Running command: /bin/systemctl enable --runtime ceph-osd@0
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@0.service → /lib/systemd/system/ceph-osd@.service.
Running command: /bin/systemctl start ceph-osd@0
--> ceph-volume lvm activate successful for osd ID: 0

In the end I destroyed the whole OSD and started from scratch. ie: Re-balancing kicked in and took hours...
 
I would prefer an official guide from Proxmox.
All the posts in the forum with "success stories" about moving OSDs are kind of anecdotal.
Half people say it didn't work for them (me included) half say "Great it worked" with not so much as a detailed description of how exactly they made it work. If it works for people, then it's obvious that there is a way to do it but there is some detail preventing others from making it work.
That's why I asked for a 100% working procedure.

The procedure mentioned in that post boils down to "ceph-volume lvm activate --all" which as mentioned did not work for me.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!