PVE8.2 how to properly reinstall Ceph?

_--James--_ · Jun 7, 2024

Running through a few DR scenarios and our scripts we used to reinstall Ceph (both at the cluster level and per node) are not working under 8. While we can pull Ceph off the node/cluster, when we go to add the 2nd node back in(on either an existing or new install) Ceph blows up at the cluster level and all Ceph members go back to an unconfigured state.

Below is the purge scripting we use, what are we missing?
-Mind you, we want to be able to do both a full cluster rebuild on Ceph if needed and a single PVE node without pulling Ceph out of the cluster.

ceph osd down 0 && ceph osd destroy 0 --force
ceph osd down 1 && ceph osd destroy 1 --force
ceph osd down 2 && ceph osd destroy 2 --force
rm /etc/ceph/ceph.conf
systemctl stop ceph-mon@pve1
systemctl stop ceph-mon@pve2
systemctl stop ceph-mon@pve3
systemctl disable ceph-mon@pve1
systemctl disable ceph-mon@pve2
systemctl disable ceph-mon@pve3
umount /var/lib/ceph/osd/ceph-0
umount /var/lib/ceph/osd/ceph-1
umount /var/lib/ceph/osd/ceph-3
rm -rf /var/lib/ceph
killall -9 ceph-mon ceph-mgr ceph-mds
rm -rf /etc/systemd/system/ceph*
reboot

pveceph purge
apt purge ceph-mon ceph-osd ceph-mgr ceph-mds
apt purge ceph-base ceph-mgr-modules-core
rm -rf /var/lib/ceph/mon/ /var/lib/ceph/mgr/ /var/lib/ceph/mds/
rm -f /etc/pve/ceph/*
rm -rf /etc/pve/ceph
rm -r /etc/pve/ceph.conf
rm -r /etc/ceph
rm -rf /etc/pve/priv/ceph.*
reboot

klx · Jun 8, 2024

https://dannyda.com/2021/04/10/how-...ph-and-its-configuration-from-proxmox-ve-pve/

_--James--_ · Jun 9, 2024

klx said:
https://dannyda.com/2021/04/10/how-...ph-and-its-configuration-from-proxmox-ve-pve/

So, that worked fine up to 8.1-4 but no longer completely works with 8.2.2. It still uninstalls Ceph, but there are still components left behind that prevent the reinstall and reconfiguration. We are trying to find out what changed between the builds and it seems to be in corosync.

So far, it seems that we have to completely nuke and pave the target host we want to rebuild Ceph on. then Pull the host from the cluster and Ceph config...etc. That works as expected. But if we want to do a ceph reinstall on existing target host it will destroy Ceph in an existing cluster unless we nuke and pave.

kayson · Jun 16, 2024

_--James--_ said:
So, that worked fine up to 8.1-4 but no longer completely works with 8.2.2. It still uninstalls Ceph, but there are still components left behind that prevent the reinstall and reconfiguration. We are trying to find out what changed between the builds and it seems to be in corosync.

So far, it seems that we have to completely nuke and pave the target host we want to rebuild Ceph on. then Pull the host from the cluster and Ceph config...etc. That works as expected. But if we want to do a ceph reinstall on existing target host it will destroy Ceph in an existing cluster unless we nuke and pave.

Came across your thread having a similar issue. Did you try this?
https://forum.proxmox.com/threads/h...fig-and-start-from-scratch.68949/#post-310629

rm -rf /etc/systemd/system/ceph*
killall -9 ceph-mon ceph-mgr ceph-mds
rm -rf /var/lib/ceph/mon/ /var/lib/ceph/mgr/ /var/lib/ceph/mds/
pveceph purge
apt -y purge ceph-mon ceph-osd ceph-mgr ceph-mds
rm /etc/init.d/ceph
rm -rf /etc/ceph

for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done
dpkg-reconfigure ceph-base
dpkg-reconfigure ceph-mds
dpkg-reconfigure ceph-common
dpkg-reconfigure ceph-fuse
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done

(I added the rm -rf /etc/ceph). It got me to a point of being able to reinstall ceph from the GUI but it's still having issues

_--James--_ · Jun 16, 2024

Thanks for the reply, forgot to update this thread with our findings!

What we are now doing to pull Ceph off a node, to be reconfigured back into the cluster. The entire goal was to DR ceph if we needed to reinstall it on the cluster and/or any given host. Prior to 8.2 we have had ceph upgrade failures that sometimes required purging hosts to reinstall ceph to the desired version.

(all of this was tested with and without the corosync host purge process)

#on host to be purged
#mark OSDs down, edit for all target OSDs
ceph osd down 4 && ceph osd destroy 4 --force
ceph osd down 5 && ceph osd destroy 5 --force

#stop services and delete config/key files - add ceph-mon@hostid as required
systemctl stop ceph-mon@pve3
systemctl disable ceph-mon@pve3
rm -rf /etc/pve/ceph.conf
rm /etc/ceph/ceph.conf
rm -rf /etc/systemd/system/ceph*
rm -rf /var/lib/ceph
killall -9 ceph-mon ceph-mgr ceph-mds

## Look for - Removed /etc/systemd/system/ceph-mon.target.wants/ceph-mon@labnode1.service.

#purge ceph from node
pveceph purge

#clean up OSD LVMs - add OSD ID's as needed (ceph-#), add /dev/nvme* as needed.
umount /var/lib/ceph/osd/ceph-4
umount /var/lib/ceph/osd/ceph-5
ceph-volume lvm zap /dev/nvme4n1 --destroy && ceph-volume lvm zap /dev/nvme5n1 --destroy

#remove Ceph install
apt purge ceph-mon ceph-osd ceph-mgr ceph-mds
apt purge ceph-base ceph-mgr-modules-core

#clean up left over ceph data
rm -rf /var/lib/ceph/mon/ /var/lib/ceph/mgr/ /var/lib/ceph/mds/
rm -f /etc/pve/ceph/*
rm -rf /etc/pve/ceph
rm -r /etc/pve/ceph.conf
rm -r /etc/ceph
rm -rf /etc/pve/priv/ceph.*

#reboot was not necessary during testing, but in a production environment I would reboot purged hosts

and what we are now doing to pull the desired node and its osds from the Ceph Cluster

#on a clustered ceph node
#destroy purged hosts OSDs, build below script for all target OSDs
ceph osd purge 4 --yes-i-really-mean-it
ceph osd purge 5 --yes-i-really-mean-it

#remove purged nodes from ceph crush_map
ceph osd crush remove pve3

#remove purged osd auth keys
ceph auth del osd.4
ceph auth del osd.5

#remove purged osd from ceph_db
ceph osd rm 4
ceph osd rm 5

#remove purged monitors
ceph mon remove pve3

#validate global config, if still listed remove references to purged host/osds/monitors
cat /etc/pve/ceph.conf

#validate ceph config, should match global config
cat /etc/ceph/ceph.conf

#reinstall Ceph on purged node(s)
#if errors during reinstall
mkdir /var/lib/ceph
mkdir /var/lib/ceph/bootstrap-osd

#if errors during configuration
mkdir /etc/pve/ceph

#add monitors/osds back in

natek434 · Oct 10, 2024

_--James--_ said:
Thanks for the reply, forgot to update this thread with our findings!

What we are now doing to pull Ceph off a node, to be reconfigured back into the cluster. The entire goal was to DR ceph if we needed to reinstall it on the cluster and/or any given host. Prior to 8.2 we have had ceph upgrade failures that sometimes required purging hosts to reinstall ceph to the desired version.

(all of this was tested with and without the corosync host purge process)

#on host to be purged
#mark OSDs down, edit for all target OSDs
ceph osd down 4 && ceph osd destroy 4 --force
ceph osd down 5 && ceph osd destroy 5 --force

#stop services and delete config/key files - add ceph-mon@hostid as required
systemctl stop ceph-mon@pve3
systemctl disable ceph-mon@pve3
rm -rf /etc/pve/ceph.conf
rm /etc/ceph/ceph.conf
rm -rf /etc/systemd/system/ceph*
rm -rf /var/lib/ceph
killall -9 ceph-mon ceph-mgr ceph-mds

## Look for - Removed /etc/systemd/system/ceph-mon.target.wants/ceph-mon@labnode1.service.

#purge ceph from node
pveceph purge

#clean up OSD LVMs - add OSD ID's as needed (ceph-#), add /dev/nvme* as needed.
umount /var/lib/ceph/osd/ceph-4
umount /var/lib/ceph/osd/ceph-5
ceph-volume lvm zap /dev/nvme4n1 --destroy && ceph-volume lvm zap /dev/nvme5n1 --destroy

#remove Ceph install
apt purge ceph-mon ceph-osd ceph-mgr ceph-mds
apt purge ceph-base ceph-mgr-modules-core

#clean up left over ceph data
rm -rf /var/lib/ceph/mon/ /var/lib/ceph/mgr/ /var/lib/ceph/mds/
rm -f /etc/pve/ceph/*
rm -rf /etc/pve/ceph
rm -r /etc/pve/ceph.conf
rm -r /etc/ceph
rm -rf /etc/pve/priv/ceph.*

#reboot was not necessary during testing, but in a production environment I would reboot purged hosts

and what we are now doing to pull the desired node and its osds from the Ceph Cluster

#on a clustered ceph node
#destroy purged hosts OSDs, build below script for all target OSDs
ceph osd purge 4 --yes-i-really-mean-it
ceph osd purge 5 --yes-i-really-mean-it

#remove purged nodes from ceph crush_map
ceph osd crush remove pve3

#remove purged osd auth keys
ceph auth del osd.4
ceph auth del osd.5

#remove purged osd from ceph_db
ceph osd rm 4
ceph osd rm 5

#remove purged monitors
ceph mon remove pve3

#validate global config, if still listed remove references to purged host/osds/monitors
cat /etc/pve/ceph.conf

#validate ceph config, should match global config
cat /etc/ceph/ceph.conf

#reinstall Ceph on purged node(s)
#if errors during reinstall
mkdir /var/lib/ceph
mkdir /var/lib/ceph/bootstrap-osd

#if errors during configuration
mkdir /etc/pve/ceph

#add monitors/osds back in

This worked for me on 8.2.7 only difference is i couldn't unmount and destroy volumes, had to destroy osd's after reinstallation then add them back

b101 · Mar 22, 2025

am lazy and have 3 nodes, 4 nvme's on each with one osd per nvme. so i just run it all on each node(one big copy paste enter:

Bash:

for i in {0..11}; do ceph osd down $i && ceph osd destroy $i --force; done ;
systemctl stop ceph-mon@pve1 ;
systemctl disable ceph-mon@pve1 ;
systemctl stop ceph-mon@pve2 ;
systemctl disable ceph-mon@pve2 ;
systemctl stop ceph-mon@pve3 ;
systemctl disable ceph-mon@pve3 ;
rm -rf /etc/pve/ceph.conf ;
rm /etc/ceph/ceph.conf ;
rm -rf /etc/systemd/system/ceph* ;
rm -rf /var/lib/ceph ;
killall -9 ceph-mon ceph-mgr ceph- ;
(pveceph purge || true) & wait ;
(apt purge ceph-mon ceph-osd ceph-mgr ceph-mds -y || true) & wait ;
(apt purge ceph-base ceph-mgr-modules-core -y || true) & wait ;
umount /var/lib/ceph/osd/ceph-{0..11} 2>/dev/null || true ;
for disk in /dev/nvme{0..3}n1; do ceph-volume lvm zap $disk --destroy || true; done ;
rm -rf /var/lib/ceph/mon/ /var/lib/ceph/mgr/ /var/lib/ceph/mds/ ;
rm -f /etc/pve/ceph/* ;
rm -rf /etc/pve/ceph ;
rm -r /etc/pve/ceph.conf ;
rm -r /etc/ceph ;
rm -rf /etc/pve/priv/ceph.* ;
mkdir -p /var/lib/ceph ;
mkdir -p /var/lib/ceph/bootstrap-osd ;
mkdir -p /etc/pve/ceph ;
mkdir -p /etc/ceph ;
apt autoremove -y ;
systemctl restart pvestatd

now the only issue is that:
ceph purge and the 2 apt purge lines break the script. so i have to run it 3 times on a node to acctualy reach the last command lines.

PVE8.2 how to properly reinstall Ceph?

_--James--_

Member

klx

New Member

_--James--_

Member

kayson

Member

_--James--_

Member

natek434

New Member

b101

New Member

We value your privacy