Recover broken Ceph config possible?

daniel17n

New Member
Feb 13, 2026
3
0
1
Hi, I had a 3-node cluster with a working ceph installation and some VMs running.

Today, I had to add a 4th and 5th nodes to the cluster.

After installing Proxmox onto the 4th one and joining it to the cluster, I tried installing ceph in said node. The installation failed (or so I assume, as out of nowhere I got a timeout (500) error), so I executed these commands on the new node:
systemctl stop ceph-*.target
apt purge ceph-mon ceph-osd ceph-mgr ceph-mds ceph-base ceph-mgr-modules-core
rm -rf /etc/ceph/*
rm -rf /var/lib/ceph/
rm -rf /etc/pve/ceph.conf
rm -rf /etc/pve/priv/ceph.*
rm -rf /etc/systemd/system/ceph*

I now realized this had erased the config on all 4 nodes. In the GUI the Ceph no longer responds ("rados_connect failed - No such file or directory (500)").
However, the VMs still work perfectly as it appears.

Is there anything I can do to recover the Ceph so that the OSDs and de VM disks do not dissapear?
Thanks for reading!
 
well since the cluster is still up, not all is lost (or at least there is still a chance of recovery.)

start with logging into a node containing a working monitor:
Code:
ceph config dump | tee -a /etc/pve/ceph.conf.rebuild

alternatively, if it doesnt work:
Code:
ceph config show-with-defaults mon.$(hostname) > /etc/pve/ceph.conf.rebuild

if this doesnt work, verify your monitor name with ps ax | grep ceph-mon

this will create a very large and messy config file since it will contain EVERY running variable, most of which will be default. you'll need to edit it down- you can find example configs elsewhere in the forum to use as a template. Once done:
Code:
cp /etc/pve/ceph.conf.rebuild /etc/pve/ceph.conf

NEXT, you'll need to recover your keyring:
Code:
ceph auth get client.admin -o /etc/pve/priv/ceph.client.admin.keyring

THEN you'll need to rebuild the symlinks on ALL NODES:
Code:
for node in $(pvecm nodes | awk 'NR>1 {print $3}'); do ssh $node "ln -sf /etc/pve/ceph.conf /etc/ceph/ceph.conf && ln -sf /etc/pve/priv/ceph.client.admin.keyring /etc/ceph/ceph.client.admin.keyring"; done

If all goes well, you're back in action.

If not.... back up all your VMs and reset the filesystem, and you'll know not to do it again next time. root access is dangerous.
 
Last edited:
  • Like
Reactions: waltar