Reinstall CEPH on Proxmox 6

lynn_yudi · Feb 21, 2020

Alwin said:
There seems to be some leftover. After you purged Ceph, is /var/lib/ceph/ empty? And is there no ceph.conf anymore?

Tried to purge and no ceph.conf anymore, it didn't work!
Tried many ways,
found out at last, with no this parameter, it‘s ok!

dropndestroy · Feb 22, 2020

@lynn_yudi Could you summarize the steps you took to install?

lynn_yudi · Feb 22, 2020

dropndestroy said:
@lynn_yudi Could you summarize the steps you took to install?

ceph install ?

dropndestroy · Feb 22, 2020

Yup, the ceph install on the latest 6.1.

I have some good news though. I re-installed the base 6.1 ISO (version 6.1-3) and followed along the simple installation steps for ceph:

1. install PVE
2. Install Ceph from UI
3. Configure with default network settings
4. Create PVE cluster
5. Install PVE on second node
6. Install ceph from UI
7. Add OSDs from all nodes
8. Create second monitor on second node
9. Create standby mgr on second node

The steps that I changed from my previous attempts were (The steps below would cause the steps above to fail):
1. Add no-sub repo
2. apt update && apt dist-upgrade

Something must be off with the latest 6.1-7 (?) (at least on my hardware)...

My next step will be to somehow change the cluster network to my dual 10g intel cards + 10g switch, but I'm still researching how to change the cluster-network for Ceph safely on PVE. (if anyone has a guide or experience, please message me!)

lynn_yudi · Feb 22, 2020

First:
All nodes have been reinstalled and upgraded to the latest! (pve-manager: 6.1-7)

Second:
i'm want to use this parameter '-disable_cephx 1'
so, setup ceph cluster with pveceph cli, not in UI

1. pveceph install
2. pveceph init --network 10.0.115.0/24 -disable_cephx 1
3. pveceph mon create

after created first mon, next steps all in from UI,like:

Create second monitor on second node
Create standby mgr on second node
Add OSDs from all nodes
...

Note: this is the first time to create CEPH cluster on the new installed pve6 with all nodes and with latest.

all above it's ok.

---
But I want to recreate this CEPH cluster, so delete all previous configurations and purge all the ceph conf and dir's.

and again to recreate the new CEPH cluster.

still from the cli.

pveceph init --network 10.0.115.0/24 -disable_cephx 1
pveceph mon create

And then it didn't work, like #16 as described

Before that, I tried the above advice, like #7 #8 #9 #10, but still fail!

So, I think this parameter is the key to the problem
of course, this is my environment, my test

albert_a · Mar 3, 2020

Did someone managed to fix it? I have the same issue:

Code:

# pveceph init  --disable_cephx 1 --network 172.22.255.0/24
# pveceph mon create
unable to get monitor info from DNS SRV with service name: ceph-mon
Could not connect to ceph cluster despite configured monitors
# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
[errno 2] error connecting to the cluster

Just reinstalled pve nodes but can not set up ceph...

Can someone give a hint on how to debug this??

Alwin · Mar 6, 2020

@albert_a, check your /etc/hosts and ip addresses.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html

albert_a · Mar 10, 2020

Alwin said:
@albert_a, check your /etc/hosts and ip addresses.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html

There's nothing special in /etc/hosts, there were fresh systems on the nodes. I saw the article, nothing related to the problem there.

As everybody mention: `pveceph purge` and subsequent `pveceph init` - makes ceph unusable. I just reinstalled again and did the setup from the first time.

Younex · Mar 11, 2020

I can confirm this problem. Everything was working but when from 5to6 and followed after the upgrade instruction to latest ceph i can get ceph working with the same problem in the initial post here was described.

I deleted all config directories, services, check host-file,...but it still compains with same message users report here.

very said, we have almost 10 licenses active and can´t use ceph.

Alwin · Mar 11, 2020

To remove manually all Ceph services throughout the cluster. Be aware all data on Ceph will be lost.

Code:

pveceph mds destroy <mdsid>
pveceph osd destroy <osdid>
pveceph mgr destroy <mgrid>
pveceph mon destroy <monid>
rm /etc/ceph/ceph.conf #removes the link

Run the above commands (depending on service) on every node in the cluster that hosts Ceph services. The corresponding directories should then be gone as well.

Code:

systemctl stop ceph-mon@<monid>.service
systemctl disable ceph-mon@<monid>.service
rm /etc/pve/ceph.conf
rm -r /var/lib/ceph/mon/ceph-<monid>/

ATM, this needs to be done for the last MON.

Bruno Emanuel · Mar 22, 2020

Unfortunattely I did the upgrade to Ceph 14 and I couldn't rollback. Can I send some log to you? Is there a way to contribute debugging ?

Bruno Emanuel · Mar 23, 2020

My problem, I solved doing an increment on timeout of mon service

Bruno Emanuel · Mar 23, 2020

Alwin · Mar 23, 2020

Bruno Emanuel said:
View attachment 15862

My first thought, the cluster hast network issues. If you more issues pop up, best open up a new thread.

Younex · Mar 23, 2020

I fixed it for my situation.

My final solution to reinstall CEPH on all nodes:

Code:

rm -rf /etc/systemd/system/ceph*
killall -9 ceph-mon ceph-mgr ceph-mds
rm -rf /var/lib/ceph/mon/  /var/lib/ceph/mgr/  /var/lib/ceph/mds/
pveceph purge
apt purge ceph-mon ceph-osd ceph-mgr ceph-mds
rm /etc/init.d/ceph
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done
dpkg-reconfigure ceph-base
dpkg-reconfigure ceph-mds
dpkg-reconfigure ceph-common
dpkg-reconfigure ceph-fuse
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done
pveceph install

mbstein · Jun 13, 2020

Thank you for that detailed script, you seem to be the first person to recover a failed ceph installation.
I tried your recipe on a failed ceph node on my otherwise smoothly running PVE cluster.

After running your script, my next step was "pveceph init" -no error messages- but then "pveceph mon create" produced the message
"Could not connect to ceph cluster despite configured monitors"

Any operations on the Gui end with timeouts
CLI Checks for the ceph status with "ceph -s" just hangs with no output at all.
CLI check with "systemctl status ceph-mon@pve54" does not find a running monitor

root@pve54:~# systemctl status ceph-mon@pve54
● ceph-mon@pve54.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: inactive (dead)

Enabling and starting the ceph-mon ended in a restart loop "ceph-mon@pve54.service: Start request repeated too quickly."
Any idea what else might need to be removed before recovery of CEPH?

Thanks for your patience.

Alwin · Jun 29, 2020

@mbstein, in Proxmox VE 6.2 there is pveceph purge available that helps in removing Ceph prior re-installation.

mbstein said:
After running your script, my next step was "pveceph init" -no error messages- but then "pveceph mon create" produced the message
"Could not connect to ceph cluster despite configured monitors"

The ceph.conf is still a leftover from the previous install.

mbstein · Jul 1, 2020

@Alwin, the pvceph purge command is part of the script provided by Younex.

Just "pveceph purge" applied to two nodes of my cluster (without any else removed) allowed for a fresh installation of Ceph,
the third node did need some directories to be (re)created, some ownerships to be restored, etc. It's running now o.k.

Thank you.

Alwin · Jul 1, 2020

mbstein said:
@Alwin, the pvceph purge command is part of the script provided by Younex.

True. I meant a more improved version of the pveceph purge is available in Proxmox VE 6.2.

mlch911 · Aug 3, 2021

Younex said:

I fixed it for my situation.

My final solution to reinstall CEPH on all nodes:

Code:

rm -rf /etc/systemd/system/ceph*
killall -9 ceph-mon ceph-mgr ceph-mds
rm -rf /var/lib/ceph/mon/  /var/lib/ceph/mgr/  /var/lib/ceph/mds/
pveceph purge
apt purge ceph-mon ceph-osd ceph-mgr ceph-mds
rm /etc/init.d/ceph
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done
dpkg-reconfigure ceph-base
dpkg-reconfigure ceph-mds
dpkg-reconfigure ceph-common
dpkg-reconfigure ceph-fuse
for i in $(apt search ceph | grep installed | awk -F/ '{print $1}'); do apt reinstall $i; done
pveceph install

This helps me a lot !

Reinstall CEPH on Proxmox 6

Renowned Member

New Member

Renowned Member

New Member

Renowned Member

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Well-Known Member

Proxmox Retired Staff

Member

Member

Member

Proxmox Retired Staff

Well-Known Member

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

New Member

We value your privacy