4.0 Cluster and Ceph Recovery

MACscr · Dec 16, 2015

One of my proxmox nodes that is also one of the ceph cluster nodes that has OSD's on it is having major issues with the operating system and I am forced to replace it with a new disk. i do though have access to the old disk, but I do not see a pve-cluster folder in /var/lib. Can I simply get that folder from another host in the cluster? If I can't, what next?

udo · Dec 16, 2015

Hi,
normaly do an new install and join the node to the cluster (with force).

If you don't want an rebuild of the ceph during new installation and reconfiguration do an "ceph osd set noout" to prevent data movement (but you have one copy less of ceph-data of course).

Udo

Q-wulf · Dec 16, 2015

just fyi:

"ceph osd set noout"
Shutdown node - pull osds
remove "broken node" from the cluster (on a working node) - see wiki
reinstall broken node
reinitialize ceph
shutdown reinstalled node and move osds back
start node
"ceph osd unset noout"

Thats how we do it when we can not recover Node-OS-ssd's

MACscr · Dec 19, 2015

pull osd's? You mean physically? Why? I don't have that option.

Also, what do you mean by reinitialize ceph? Just installing the proxmox ceph packages again? Then what? Add it as a monitor again? Some specifics would really be nice instead of a general idea.

Thanks again for the help!

Q-wulf · Dec 20, 2015

MACscr said:
pull osd's? You mean physically? Why? I don't have that option.

Also, what do you mean by reinitialize ceph? Just installing the proxmox ceph packages again? Then what? Add it as a monitor again? Some specifics would really be nice instead of a general idea.

Thanks again for the help!

pull OSD means the following:
either remove em from their hot-swap bays, or disconnect their cables. The reason is, so you do not overwrite their data by accident while reinstalling the node. You do not have to do it, if you are reasonable sure you know you will not overwrite them by accident

reinitialize ceph means basically, to do everything that you need to do to your ceph install to make it work in that original cluster. That includes reinstalling ceph on that node. making sure that node sits in the same cluster and has a monitor (if you want it to have one) It also includes moving your custom hook scripts back in place (unless you house em on the pve-cluster file system) and every other manual modification you might have done to the "base" pveceph install, that is not covered by the /etc/pve/ filesystem.

When you then put the osds back in they previously borken node, they come right back up and get put straight back into crush.

ps.: we do have physical access to our nodes. I just pull each drive a cm from the hotswap-bay and that is it. should have been more specific about that.

Search

Search

4.0 Cluster and Ceph Recovery

MACscr

Member

udo

Distinguished Member

Q-wulf

Renowned Member

MACscr

Member

Q-wulf

Renowned Member