Monitored address already in use (500)

ivusi

Active Member
Jul 3, 2019
1
0
41
35
I have a 3 node Ceph cluster. I was updating the node via 'apt upgrade' when it crashed. It rebooted fine and all came back however the monitoring service had stopped for that node.
I tried restarting etc to no avail.
I checked that the other 2 were still in quorum, which was ok


Code:
wallis@prox01:~$ sudo ceph -s
  cluster:
    id:     68dd8e32-f7b1-479b-a406-dd8db83fd50d
    health: HEALTH_WARN
            1/3 mons down, quorum prox01,prox02
 
  services:
    mon: 3 daemons, quorum prox01,prox02 (age 4h), out of quorum: prox03
    mgr: prox02(active, since 5M), standbys: prox01, prox03
    mds: 1/1 daemons up, 2 standby
    osd: 18 osds: 18 up (since 4h), 18 in (since 11w)
 
  data:
    volumes: 1/1 healthy
    pools:   5 pools, 705 pgs
    objects: 941.62k objects, 3.5 TiB
    usage:   11 TiB used, 5.7 TiB / 16 TiB avail
    pgs:     705 active+clean
 
  io:
    client:   231 KiB/s rd, 1.2 MiB/s wr, 5 op/s rd, 113 op/s wr
 
wallis@prox01:~$


I then deleted mon on prox03 via the GUI and recreated it, when I got the message "monitor address 10.0.31.20 already in use (500)

I checked: ceph mon stat#

Code:
sudo ceph mon stat
e5: 2 mons at {prox01=[v2:10.0.31.18:3300/0,v1:10.0.31.18:6789/0],prox02=[v2:10.0.31.19:3300/0,v1:10.0.31.19:6789/0]} removed_ranks: {2}, election epoch 2204, leader 0 prox01, quorum 0,1 prox01,prox02

and I saw prox03 was not referenced here but i did see it referenced in the [global] section 10.0.31.20

Code:
wallis@prox03:~$ sudo cat /etc/pve/ceph.conf
[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 10.0.31.16/28
    fsid = 68dd8e32-f7b1-479b-a406-dd8db83fd50d
    mon_allow_pool_delete = true
    mon_host = 10.0.31.18 10.0.31.19 10.0.31.20
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 10.0.31.16/28

[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
    keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
    keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.prox01]
    host = prox01
    mds_standby_for_name = pve

[mds.prox02]
    host = prox02
    mds_standby_for_name = pve

[mds.prox03]
    host = prox03
    mds_standby_for_name = pve

[mon.prox01]
    public_addr = 10.0.31.18

[mon.prox02]
    public_addr = 10.0.31.19

wallis@prox03:~$


Can i manually remove 10.0.30.21 from each cluster ceph.conf file?
Many thnaks