Reinstall PVE node with Ceph

ozdjh

Well-Known Member
Oct 8, 2019
115
26
48
Hi

We're testing PVE and Ceph, trying out failure conditions. We're simulating a failed cluster node where we want to reinstall and bring up the existing ceph OSDs. There are a few forum threads and notes about trying to reinstall a node, but nothing clear or complete that we've been able to find. We want to simulate a failed boot device, or installing a fresh PVE node in a different chassis and moving the OSD drives to it.

Is there a documented process for bringing ceph OSDs back online after reinstalling a node? Details of what to restore from backup or what to reinitialise? Our attempts so far have resulted in a PVE node back in the cluster, the ceph monitor and manager running, and the OSDs visible in the UI but they will not start.


Thanks

David
...
 
Is there a documented process for bringing ceph OSDs back online after reinstalling a node? Details of what to restore from backup or what to reinitialise? Our attempts so far have resulted in a PVE node back in the cluster, the ceph monitor and manager running, and the OSDs visible in the UI but they will not start.
A failed node needs to be removed first from the Proxmox VE cluster [0]. And then the Ceph MON can either be removed or re-installed [1].

I roughly outlined the steps in my post [2]. The OSDs in the other thread are LVs with Bluestore. The ceph-osd.target usually takes care of mounting OSDs (extracting metadata from the LVs). It might only need the initial step to import them.

[0] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_remove_a_cluster_node
[1] https://docs.ceph.com/docs/nautilus/rados/operations/add-or-rm-mons/
[2] https://forum.proxmox.com/threads/osd-move-issue.56932/#post-263918
 
Hi

We worked this out over the weekend. The issue regarding the OSDs not starting was related to ownership of the dev mappers. The dm were still owned by root so running ceph-osd as the 'ceph' user was failing with permission denied.

We resolved the problem and tested the process a couple of times to check it and simplify it. It works reliably in our environment so I've documented it below in case it's of use to others. The OSDs are Bluestore LVs under PVE 6.0.

  1. Install Proxmox on new boot drive
  2. Restore basic server config from backup
    • /etc/network/interfaces
    • /etc/resolv.conf
    • /etc/apt/apt.conf
    • /etc/apt/sources
    • /etc/systemd/timesyncd.conf
  3. Reboot
  4. Add node to PVE cluster
  5. Install Ceph on new node
  6. Remove old monitor
    • ceph mon rm NODENAME
    • remove IP address from mon_host in /etc/ceph/ceph.conf
  7. Create monitor for new node
  8. OSDs will be visible but down and unable to start due to dev ownership etc. Active them to fix it.
    • ceph-volume lvm activate --all

Thanks

David
...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!