Ceph - broken configuration - rados_connect failed - No such file or directory (500)

Casual · Jul 23, 2023

Well, I've messed up pretty big this time.

I've successfully readded device (PVE1) from a cluster due to my physical setup problems - https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node
But start to see that new OSDs on PVE1 won't show in ceph - I can add them, they was down and wouldn't show in OSDs. But appear as "ghostes" OSDs in control panel.

So I thought that I need to reinstall ceph in PVE1.

With following (again) guide for reinstalling ceph - https://dannyda.com/2021/04/10/how-...ph-and-its-configuration-from-proxmox-ve-pve/

I stopped at 1.16 `rm -r /etc/pve/ceph.conf` (while following steps only for my PVE1 node) and mentioned that I can configure ceph in GUI on PVE1

"Wow, that was easier than I thought".

Configured on PVE1. It showed error "rados_connect failed - No such file or directory (500)". And all my nodes go into same error. Turned out I deleted ceph.conf on all my nodes and re"installed" on all nodes.

"Well, I fu**ked."

Hopefully, cephfs works as it was so I dumping all to my PC. Only RBD in "?" state.

I've managed to scramble how would look my proper ceph.conf but it still shows error. I guess I need to create ceph crushmap?

Any Ideas how I can revive ceph?

Code:

# ceph service status
failed to get an address for mon.pve: error -2
failed to get an address for mon.pve02: error -2
unable to get monitor info from DNS SRV with service name: ceph-mon
2023-07-23T22:02:37.240+0300 7f0dad1b06c0 -1 failed for service _ceph-mon._tcp
2023-07-23T22:02:37.240+0300 7f0dad1b06c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
[errno 2] RADOS object not found (error connecting to the cluster)

ceph.conf

Code:

[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster network = 192.168.0.100/24
     cluster_network = 192.168.0.100/24
     fsid = 7fhb4fea-aa7c-4908-981b-3a84aabf8123
     mon_allow_pool_delete = true
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public network = 192.168.0.100/24
     public_network = 192.168.0.100/24


[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve]
         host = pve
         mds_standby_for_name = pve

[mon.pve]
         public_addr = 192.168.0.100

# [mon.pve01]
         # public_addr = 192.168.0.101

[mon.pve02]
         public_addr = 192.168.0.102

Casual · Jul 23, 2023

Code:

systemctl restart ceph.target

Broke working cephfs

Now I'm pretty nervous

Casual · Jul 23, 2023

Code:

[mon.pve]
                 host = pve
         public_addr = 192.168.0.100
         mon_addr = 192.168.0.100:6789

# [mon.pve01]
         # public_addr = 192.168.0.101

[mon.pve02]
                 host = pve02
         public_addr = 192.168.0.102
         mon_addr = 192.168.0.102:6789

and
` systemctl restart ceph.target`

give me some reasonable working gui

Casual · Jul 23, 2023

Yey! After minute of terror with GUI showing unknown PGs I've reverted it to previous state! But I still unable to add OSDs from PVE1. Any hints?

luis15pt · Jan 16, 2025

Anyone else looking at this thread all you had to do is run this in proxmox shell:

Bash:

ceph mgr module enable prometheus

The amount of time i spent trying to figure this out

By default the module will accept HTTP requests on port 9283 on all IPv4 and IPv6 addresses on the host. The port and listen address are both configurable with ceph config set, with keys mgr/prometheus/server_addr and mgr/prometheus/server_port. This port is registered with Prometheus’s registry.

ceph config set mgr mgr/prometheus/server_addr 0.0.0.
ceph config set mgr mgr/prometheus/server_port 9283

info: https://docs.ceph.com/en/latest/mgr/prometheus/

cboyles · Apr 30, 2026

I am having these same issues. I installed ceph but can't get the monitor to work. First was the DNS SRV and now I am getting could not connect to ceph cluster despite configured monitors (500). I already started all over once.

It seems like when I try and put the cluster in its own IP, it doesn't want to work. Help

Ceph - broken configuration - rados_connect failed - No such file or directory (500)

Casual

New Member

Casual

New Member

Casual

New Member

Casual

New Member

luis15pt

New Member

cboyles

New Member

Attachments

We value your privacy