Replace orphaned PVE

Laser47

New Member
Dec 29, 2024
2
0
1
I have a home lab cluster with 3 PVE nodes running Proxmox 8.3.3 with Ceph 19.2.0. One of the nodes blew out its OS drive and now it won't boot. I've ordered new drives (WD Red) and need to start with rebuilding the lost node. I'd like to build it back "in place" - e.g. same name, IP, etc. instead of calling it pve-04 and perpetually having a skip in my naming scheme forever.

Before I rebuild, is there anything I should do on the current systems? Of course, Ceph is angry and shows the Monitor, Manager, and Meta Data Server services down - there's no quorum. I'm ok with wiping the Ceph drive in the orphaned system as part of the rebuild process and let Ceph re-copy the missing blocks.

Is there anything to do on the Proxmox side? Is there anything special to remove the pve from the cluster?

Once I get the orphaned system fixed, what would be the process to rebuild the others on the WD Red drives? Is there something I should do to more gracefully pull them from the cluster before shutting them down?


TIA
 
So, to clean this up on the Proxmox VE side, remove the node from the cluster: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node

For Ceph, remove the missing services so that Ceph forgets about them. 2 / 3 MONs are still present right?


Code:
ceph mon remove {mon-id}
ceph osd rm {osd id}

https://docs.ceph.com/en/reef/rados/operations/add-or-rm-mons/#removing-monitors
https://docs.ceph.com/en/reef/rados/operations/add-or-rm-osds/#removing-osds-manual


Once the node is working again, and you still have the OSD disks as they are, after it joined the Proxmox VE cluster and the Ceph packages are installed, consider running
CSS:
ceph-volume lvm activate --all
It will try to detect exisiting OSDs and will make sure they are added to the CRUSH map to show up for the node. This could make the rebalance a lot faster.
 
So, to clean this up on the Proxmox VE side, remove the node from the cluster: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node

For Ceph, remove the missing services so that Ceph forgets about them. 2 / 3 MONs are still present right?


Code:
ceph mon remove {mon-id}
ceph osd rm {osd id}

https://docs.ceph.com/en/reef/rados/operations/add-or-rm-mons/#removing-monitors
https://docs.ceph.com/en/reef/rados/operations/add-or-rm-osds/#removing-osds-manual


Once the node is working again, and you still have the OSD disks as they are, after it joined the Proxmox VE cluster and the Ceph packages are installed, consider running
CSS:
ceph-volume lvm activate --all
It will try to detect exisiting OSDs and will make sure they are added to the CRUSH map to show up for the node. This could make the rebalance a lot faster.


Yes, 2/3 are still present.

Thank you so much. I'll start going through this.