4.0 Cluster and Ceph Recovery

MACscr

Member
Mar 19, 2013
95
3
8
One of my proxmox nodes that is also one of the ceph cluster nodes that has OSD's on it is having major issues with the operating system and I am forced to replace it with a new disk. i do though have access to the old disk, but I do not see a pve-cluster folder in /var/lib. Can I simply get that folder from another host in the cluster? If I can't, what next?
 
Hi,
normaly do an new install and join the node to the cluster (with force).

If you don't want an rebuild of the ceph during new installation and reconfiguration do an "ceph osd set noout" to prevent data movement (but you have one copy less of ceph-data of course).

Udo
 
just fyi:
  1. "ceph osd set noout"
  2. Shutdown node - pull osds
  3. remove "broken node" from the cluster (on a working node) - see wiki
  4. reinstall broken node
  5. reinitialize ceph
  6. shutdown reinstalled node and move osds back
  7. start node
  8. "ceph osd unset noout"
Thats how we do it when we can not recover Node-OS-ssd's
 
pull osd's? You mean physically? Why? I don't have that option.

Also, what do you mean by reinitialize ceph? Just installing the proxmox ceph packages again? Then what? Add it as a monitor again? Some specifics would really be nice instead of a general idea.

Thanks again for the help!
 
pull osd's? You mean physically? Why? I don't have that option.

Also, what do you mean by reinitialize ceph? Just installing the proxmox ceph packages again? Then what? Add it as a monitor again? Some specifics would really be nice instead of a general idea.

Thanks again for the help!


pull OSD means the following:
either remove em from their hot-swap bays, or disconnect their cables. The reason is, so you do not overwrite their data by accident while reinstalling the node. You do not have to do it, if you are reasonable sure you know you will not overwrite them by accident :p

reinitialize ceph means basically, to do everything that you need to do to your ceph install to make it work in that original cluster. That includes reinstalling ceph on that node. making sure that node sits in the same cluster and has a monitor (if you want it to have one) It also includes moving your custom hook scripts back in place (unless you house em on the pve-cluster file system) and every other manual modification you might have done to the "base" pveceph install, that is not covered by the /etc/pve/ filesystem.

When you then put the osds back in they previously borken node, they come right back up and get put straight back into crush.

ps.: we do have physical access to our nodes. I just pull each drive a cm from the hotswap-bay and that is it. should have been more specific about that.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!