Could not connect to ceph cluster despite configured monitors (500)

Magneto

Well-Known Member
Jul 30, 2017
138
4
58
46
Please help.

I tried to change my CEPH IP's from 192.168.10.0/24 to 192.168.11.0/24, but it wen't horribly wrong. My initital setup is / was as follows:
3 servers:
SRV1 - 192.168.10.241
SRV2 - 192.168.10.242
SRV3 - 192.168.10.243

I wanted to move CEPH to a 2nd IP subnet, with different network cards,
Storage1 - 192.168.11.241
Storage2 - 192.168.11.242
Storage3 - 192.168.11.243

Both IP subnets work and all hosts are in /etc/hosts.



So I ran this code on each node, with the hostname and IP changed as necessary.

Code:
cd /home
mkdir tmp
ceph auth get mon. -o tmp/key-ceph-Storage1
ceph mon getmap -o tmp/map-ceph-Storage1
ceph-mon -i Storage1 --mkfs --monmap tmp/map-ceph-Storage1 --keyring tmp/key-ceph-Storage1
chown ceph:ceph -Rf /var/lib/ceph/mon/ceph-Storage1/
ceph-mon -i Storage1 --public-addr 192.168.11.241:6789



All 6 monitors showed up, so I started removing the SRV1-SRV3 monitors, and then they all dissipated. Now I am trying to re-add some monitors but keep getting this error:

Could not connect to ceph cluster despite configured monitors (500)
 
Here's the ceph.conf file:

Bash:
root@SRV1:/home# more /etc/ceph/ceph.conf
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster network = 192.168.10.0/24
         cluster_network = 192.168.10.0/24
         fsid = 1d0f6b10-587b-46b1-9978-4ddf4a61b8fe
         mon_allow_pool_delete = true
         mon_host = 192.168.10.241 192.168.10.242 192.168.10.243
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public network = 192.168.10.0/24
         public_network = 192.168.10.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.SRV3]
         host = SRV3
         mds_standby_for_name = pve

[mon.SRV1]
         public_addr = 192.168.10.241

[mon.SRV2]
         public_addr = 192.168.10.242

[mon.SRV3]
         public_addr = 192.168.10.243
 
All 6 monitors showed up, so I started removing the SRV1-SRV3 monitors, and then they all dissipated. Now I am trying to re-add some monitors but keep getting this error:
Is there still an old monitor left? If the new MONs didn't connect to the old ones and you removed the old MONs, then the other maps (OSD, MDS, PG, Crush) are most likely lost.

cluster network = 192.168.10.0/24
cluster_network = 192.168.10.0/24
public network = 192.168.10.0/24
public_network = 192.168.10.0/24
The network stanzas are only needed once. And you can have multiple network on one config option.

In general to change the public_network IP range, the new network needs to be routed to the existing subnet. And then one MON at a time, it can be replaced with a new MON from the new subnet.
 
  • Like
Reactions: takeokun
Hello,

I have the same problem with my proxmox Ceph cluster : Could not connect to ceph cluster despite configured monitors (500). For the moment I'm unable to retrieve data from my osd.

Have you find a solution ?

Kind regards.
 
@vlazarus, please open up a new thread and elaborate on your cluster in more detail.