How to clean up a bad ceph config and start from scratch?

BobMccapherey · Apr 25, 2020

What is the best way to clean up a bad ceph config and start from scratch without rebuilding the Proxmox server (as everything else works fine)? (This is Proxmox VE 6.1)

Configured a ceph cluster that was working, although monitors for some reason were showing up twice in the Proxmox GUI, one with an OK status and one with a ?.

Tried deleting everything in /etc/pve/ceph*, /var/lib/ceph - but it still seems like configuration elements are being retained somewhere.

Afterwards I tried rebuilding the ceph cluster from scratch but keep getting 500 timeout. A new monitor I created won't start either.

BobMccapherey · Apr 27, 2020

Just an update - tried completely removing the ceph packages, including an 'apt purge' to remove any lingering configs. Also removed all of the keyring and ceph configuration files from /etc/ceph and /etc/pve/ceph.conf and /etc/pve/priv/ceph*.keyring.

Then I also removed all of the ceph services from /usr/lib/systemd as well as any references to ceph in /var/lib/rrdcached/db/pve2-storage

I then grep for any occurences of ceph in /var/lib/pve* and I can't find anything. However, a reinstall and going through the wizard leads me to the same issues as this poster here: https://forum.proxmox.com/threads/ceph-reinstallation-issues.56580/

After cancelling configuration, I click on "Monitors" and I still see an old monitor lingering from a previous attempt. No idea how it's still lingering there after clearing instances of ceph everywhere on both hosts I have right now (I'm making sure everything is healthy before buying a third host from my provider). Extremely frustrating and it seems that there is some undocumented other place where ceph configuration is being held. Is this being stored with the cluster info somewhere?

Alwin · Apr 27, 2020

On purging Ceph, you will also need to remove /var/lib/ceph/ and best reboot to get the systemd units removed.

BobMccapherey · May 1, 2020

Alwin said:
On purging Ceph, you will also need to remove /var/lib/ceph/ and best reboot to get the systemd units removed.

I've already tried this but it seems something may be lingering in the proxmox configuration database. Either that or there is some other place that's keeping a monmap from a previous Ceph installation that's being shared in the cluster fs, as something is populating the UI with monitors I've long since gotten rid of. I also suspect this rogue monmap is preventing future installations from starting properly.

Alwin · May 4, 2020

The systemd Ceph services need to be stopped and disabled as well.
See this preliminary patch.
https://pve.proxmox.com/pipermail/pve-devel/2020-May/043303.html

Rares · May 4, 2020

Alwin said:
On purging Ceph, you will also need to remove /var/lib/ceph/ and best reboot to get the systemd units removed.

I don't remeber on what post I found it but this is what I use to purge ceph:

Code:

rm -rf /etc/systemd/system/ceph*
killall -9 ceph-mon ceph-mgr ceph-mds
rm -rf /var/lib/ceph/mon/  /var/lib/ceph/mgr/  /var/lib/ceph/mds/
pveceph purge
apt -y purge ceph-mon ceph-osd ceph-mgr ceph-mds
rm /etc/init.d/ceph
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done
dpkg-reconfigure ceph-base
dpkg-reconfigure ceph-mds
dpkg-reconfigure ceph-common
dpkg-reconfigure ceph-fuse
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done

After this you go and reinstall ceph the classic way.

Alwin · May 4, 2020

Rares said:
dpkg-reconfigure ceph-base dpkg-reconfigure ceph-mds dpkg-reconfigure ceph-common dpkg-reconfigure ceph-fuse

These should not be needed if the subdir (eg. mon/mgr/osd) of the service stays.

djfreak · Jul 23, 2023

https://dannyda.com/2021/04/10/how-...ph-and-its-configuration-from-proxmox-ve-pve/

jutahx · Feb 26, 2024

Rares said:

I don't remeber on what post I found it but this is what I use to purge ceph:

Code:

rm -rf /etc/systemd/system/ceph*
killall -9 ceph-mon ceph-mgr ceph-mds
rm -rf /var/lib/ceph/mon/  /var/lib/ceph/mgr/  /var/lib/ceph/mds/
pveceph purge
apt -y purge ceph-mon ceph-osd ceph-mgr ceph-mds
rm /etc/init.d/ceph
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done
dpkg-reconfigure ceph-base
dpkg-reconfigure ceph-mds
dpkg-reconfigure ceph-common
dpkg-reconfigure ceph-fuse
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done

After this you go and reinstall ceph the classic way.

Thank you so much!
Great solution

pve7to8
= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
WARN: updates for the following packages are available:
pve-qemu-kvm

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 7.4-1

Checking running kernel version..
PASS: running kernel '5.15.143-1-pve' is considered suitable for upgrade.

= CHECKING CLUSTER HEALTH/SETTINGS =

SKIP: standalone node.

= CHECKING HYPER-CONVERGED CEPH STATUS =

INFO: hyper-converged ceph setup detected!
INFO: getting Ceph status/health information..
FAIL: failed to get 'noout' flag status - got timeout

FAIL: unable to determine Ceph status!
INFO: checking local Ceph version..
FAIL: Hyper-converged Ceph 16 Pacific is to old for upgrade!
Upgrade Ceph first to Quincy following our how-to:
<https://pve.proxmox.com/wiki/Category:Ceph_Upgrade>
INFO: getting Ceph daemon versions..
FAIL: unable to determine Ceph daemon versions!
WARN: 'noout' flag not set - recommended to prevent rebalancing during upgrades.
INFO: checking Ceph config..

After running script solved my 4 ceph error.

Search

Search

How to clean up a bad ceph config and start from scratch?

BobMccapherey

Member

BobMccapherey

Member

Alwin

Proxmox Retired Staff

BobMccapherey

Member

Alwin

Proxmox Retired Staff

Rares

Well-Known Member

Alwin

Proxmox Retired Staff

djfreak

New Member

jutahx

Member