How to clean up a bad ceph config and start from scratch?

BobMccapherey

Member
Apr 25, 2020
33
0
6
42
What is the best way to clean up a bad ceph config and start from scratch without rebuilding the Proxmox server (as everything else works fine)? (This is Proxmox VE 6.1)

Configured a ceph cluster that was working, although monitors for some reason were showing up twice in the Proxmox GUI, one with an OK status and one with a ?.

Tried deleting everything in /etc/pve/ceph*, /var/lib/ceph - but it still seems like configuration elements are being retained somewhere.

Afterwards I tried rebuilding the ceph cluster from scratch but keep getting 500 timeout. A new monitor I created won't start either.
 
Just an update - tried completely removing the ceph packages, including an 'apt purge' to remove any lingering configs. Also removed all of the keyring and ceph configuration files from /etc/ceph and /etc/pve/ceph.conf and /etc/pve/priv/ceph*.keyring.

Then I also removed all of the ceph services from /usr/lib/systemd as well as any references to ceph in /var/lib/rrdcached/db/pve2-storage

I then grep for any occurences of ceph in /var/lib/pve* and I can't find anything. However, a reinstall and going through the wizard leads me to the same issues as this poster here: https://forum.proxmox.com/threads/ceph-reinstallation-issues.56580/

After cancelling configuration, I click on "Monitors" and I still see an old monitor lingering from a previous attempt. No idea how it's still lingering there after clearing instances of ceph everywhere on both hosts I have right now (I'm making sure everything is healthy before buying a third host from my provider). Extremely frustrating and it seems that there is some undocumented other place where ceph configuration is being held. Is this being stored with the cluster info somewhere?
 
On purging Ceph, you will also need to remove /var/lib/ceph/ and best reboot to get the systemd units removed.
 
On purging Ceph, you will also need to remove /var/lib/ceph/ and best reboot to get the systemd units removed.

I've already tried this but it seems something may be lingering in the proxmox configuration database. Either that or there is some other place that's keeping a monmap from a previous Ceph installation that's being shared in the cluster fs, as something is populating the UI with monitors I've long since gotten rid of. I also suspect this rogue monmap is preventing future installations from starting properly.
 
On purging Ceph, you will also need to remove /var/lib/ceph/ and best reboot to get the systemd units removed.

I don't remeber on what post I found it but this is what I use to purge ceph:

Code:
rm -rf /etc/systemd/system/ceph*
killall -9 ceph-mon ceph-mgr ceph-mds
rm -rf /var/lib/ceph/mon/  /var/lib/ceph/mgr/  /var/lib/ceph/mds/
pveceph purge
apt -y purge ceph-mon ceph-osd ceph-mgr ceph-mds
rm /etc/init.d/ceph
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done
dpkg-reconfigure ceph-base
dpkg-reconfigure ceph-mds
dpkg-reconfigure ceph-common
dpkg-reconfigure ceph-fuse
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done

After this you go and reinstall ceph the classic way.
 
dpkg-reconfigure ceph-base dpkg-reconfigure ceph-mds dpkg-reconfigure ceph-common dpkg-reconfigure ceph-fuse
These should not be needed if the subdir (eg. mon/mgr/osd) of the service stays.
 
I don't remeber on what post I found it but this is what I use to purge ceph:

Code:
rm -rf /etc/systemd/system/ceph*
killall -9 ceph-mon ceph-mgr ceph-mds
rm -rf /var/lib/ceph/mon/  /var/lib/ceph/mgr/  /var/lib/ceph/mds/
pveceph purge
apt -y purge ceph-mon ceph-osd ceph-mgr ceph-mds
rm /etc/init.d/ceph
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done
dpkg-reconfigure ceph-base
dpkg-reconfigure ceph-mds
dpkg-reconfigure ceph-common
dpkg-reconfigure ceph-fuse
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done

After this you go and reinstall ceph the classic way.
Thank you so much!
Great solution

pve7to8
= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
WARN: updates for the following packages are available:
pve-qemu-kvm

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 7.4-1

Checking running kernel version..
PASS: running kernel '5.15.143-1-pve' is considered suitable for upgrade.

= CHECKING CLUSTER HEALTH/SETTINGS =

SKIP: standalone node.

= CHECKING HYPER-CONVERGED CEPH STATUS =

INFO: hyper-converged ceph setup detected!
INFO: getting Ceph status/health information..
FAIL: failed to get 'noout' flag status - got timeout

FAIL: unable to determine Ceph status!
INFO: checking local Ceph version..
FAIL: Hyper-converged Ceph 16 Pacific is to old for upgrade!
Upgrade Ceph first to Quincy following our how-to:
<https://pve.proxmox.com/wiki/Category:Ceph_Upgrade>
INFO: getting Ceph daemon versions..
FAIL: unable to determine Ceph daemon versions!
WARN: 'noout' flag not set - recommended to prevent rebalancing during upgrades.
INFO: checking Ceph config..

After running script solved my 4 ceph error.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!