No OSDs anymore after Recovery Monitors using OSDs

gehkn

New Member
Jun 23, 2025
1
0
1
Hello everyone,

Like you I am passionate about opensource, proxmox and ceph.
I recently tried something on my ceph cluster when I wanted to change the subnet of my MONs (all at once).
It broke my cluster. I know it was stupid.
The version of ceph is Reef (18).

My new MONs could no longer see my cluster and my OSDs were no longer monitored by my MONs.

To restore the initial state after several attempts, I decided to follow ceph's recommendations:
https://docs.ceph.com/en/quincy/rad...leshooting-mon/#mon-store-recovery-using-osds

And more specifically to apply this script, which I modified accordingly:
https://github.com/sofmeright/PVE_Ceph-Disaster_Recovery

The script ran without a hitch.
But unfortunately I still don't have access to the cluster.

I got this message on the 'ceph -s' command (yesterday):
Bash:
2025-06-23T20:55:03.630+0200 77b0f4e006c0 -1 monclient(hunting): handle_auth_bad_method server allowed_methods but i only support
[errno 13] RADOS permission denied (error connecting to the cluster)


Today, the same command returns after a long time:
Bash:
caps mds = "allow *"
2025-06-24T20:58:24.760+0200 7158566006c0 0 monclient(hunting): authenticate timed out after 300
[errno 110] RADOS timed out (error connecting to the cluster)


My cluster consists of 3 nodes and 17 OSDSs and 3 MONs.
After the script, I only have 2 Monitors Running.


- On pve01, pve02, pve03,
/var/lib/ceph/osd/ceph-pve0x,
all OSDS have the following files:
block
fsid
ready
type
ceph_fsid
keyring
require_osd_release
whoami


- ceph_fsid for each OSD is the same and matches that in ceph.conf

- the '/etc/pve/priv/ceph.client.admin.keyring' file is there:
Bash:
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *"

- Pve01 and pve02 have the same keyring in “/var/lib/ceph/mon/ceph-{host}”.

The only question I have at this stage :

Is my cluster officially dead? Or do you think I'm close to hope?

And if so, do you have any idea? I don't have any.

Thank you ;)
 
Last edited: