nvmedisk move to other slot

Gerhard W. Recher

Well-Known Member
Mar 10, 2017
158
8
58
Munich
Hi Followers,

We made a mistake in physically placing nvme disks on our machine.

currently 7 nvme per node, but all are tied to cpu#1

I search for a best practice to move half of them to disk-slots tied to cpu#2

my first approach from theory :)

on each storage node:

move.sh: (param: 1) osd# to move 2) devicename new
example: ./move.sh 6 /dev/nvmean1

Code:
ID=$1
echo "wait for cluster ok"
while ! ceph health | grep HEALTH_OK ; do echo -n "."; sleep 10 ; done
echo "ceph osd out $ID"
ceph osd out $ID
sleep 10
while ! ceph health | grep HEALTH_OK ; do sleep 10 ; done
echo "systemctl stop ceph-osd@$ID.service"
systemctl stop ceph-osd@$ID.service
# wait for calming down things...
sleep 60
DEVICE=$2
umount /var/lib/ceph/osd/ceph-$ID
echo "move osd $ID to intented slot, i wait 600 seconds!"
sleep 600
echo "ceph-disk zap $DEVICE"
ceph-disk zap $DEVICE
ceph osd destroy $ID --yes-i-really-mean-it
echo "ceph-disk prepare --bluestore $DEVICE --osd-id $ID"
ceph-disk prepare --bluestore $DEVICE --osd-id $ID
sleep 10;
ceph osd metadata $ID
ceph -s
echo "wait for cluster ok"
while ! ceph health | grep HEALTH_OK ; do echo -n "."; sleep 10 ; done
ceph -s
echo " proceed with next"


is there a better way to accomplish this task ?

Regards

Gerhard
 
you don't need to zap / destroy / prepare an OSD just because you move its physical location on the host. just out, stop and unmount it, move it, and run ceph-disk activate-all
 
you don't need to zap / destroy / prepare an OSD just because you move its physical location on the host. just out, stop and unmount it, move it, and run ceph-disk activate-all

thx Fabian,

Like this one?

Code:
ID=$1
echo "wait for cluster ok"
while ! ceph health | grep HEALTH_OK ; do echo -n "."; sleep 10 ; done
echo "ceph osd out $ID"
ceph osd out $ID
sleep 10
while ! ceph health | grep HEALTH_OK ; do sleep 10 ; done
echo "systemctl stop ceph-osd@$ID.service"
systemctl stop ceph-osd@$ID.service
sleep 60
umount /var/lib/ceph/osd/ceph-$ID
echo "plug your osd $ID to new place, i wait 5 Minutes sleep may be replaced by input ..."
sleep 300
ceph-disk activate-all
echo "wait for cluster ok"
while ! ceph health | grep HEALTH_OK ; do echo -n "."; sleep 10 ; done
ceph -s
echo " proceed with next"
 
you don't need to zap / destroy / prepare an OSD just because you move its physical location on the host. just out, stop and unmount it, move it, and run ceph-disk activate-all

Hi Fabian shutting down hole cluster, moving disks as intended, and power up should also work like a charme ?
 
I think I'd prefer only shutting down within a failure domain (likely host), and I'd set "noout" first to prevent unneeded rebalancing. you should be able to test all of it with a virtual ceph cluster though (e.g., with scsiX, the X is translated into the virtual physical location by PVE code, with virtio-scsi-single you even get a move to a different scsi controller not just slot), which I would highly recommend before doing any such operations on a production cluster.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!