I have a 3 node Ceph cluster. I was updating the node via 'apt upgrade' when it crashed. It rebooted fine and all came back however the monitoring service had stopped for that node.
I tried restarting etc to no avail.
I checked that the other 2 were still in quorum, which was ok
I then deleted mon on prox03 via the GUI and recreated it, when I got the message "monitor address 10.0.31.20 already in use (500)
I checked: ceph mon stat#
and I saw prox03 was not referenced here but i did see it referenced in the [global] section 10.0.31.20
Can i manually remove 10.0.30.21 from each cluster ceph.conf file?
Many thnaks
I tried restarting etc to no avail.
I checked that the other 2 were still in quorum, which was ok
Code:
wallis@prox01:~$ sudo ceph -s
cluster:
id: 68dd8e32-f7b1-479b-a406-dd8db83fd50d
health: HEALTH_WARN
1/3 mons down, quorum prox01,prox02
services:
mon: 3 daemons, quorum prox01,prox02 (age 4h), out of quorum: prox03
mgr: prox02(active, since 5M), standbys: prox01, prox03
mds: 1/1 daemons up, 2 standby
osd: 18 osds: 18 up (since 4h), 18 in (since 11w)
data:
volumes: 1/1 healthy
pools: 5 pools, 705 pgs
objects: 941.62k objects, 3.5 TiB
usage: 11 TiB used, 5.7 TiB / 16 TiB avail
pgs: 705 active+clean
io:
client: 231 KiB/s rd, 1.2 MiB/s wr, 5 op/s rd, 113 op/s wr
wallis@prox01:~$
I then deleted mon on prox03 via the GUI and recreated it, when I got the message "monitor address 10.0.31.20 already in use (500)
I checked: ceph mon stat#
Code:
sudo ceph mon stat
e5: 2 mons at {prox01=[v2:10.0.31.18:3300/0,v1:10.0.31.18:6789/0],prox02=[v2:10.0.31.19:3300/0,v1:10.0.31.19:6789/0]} removed_ranks: {2}, election epoch 2204, leader 0 prox01, quorum 0,1 prox01,prox02
and I saw prox03 was not referenced here but i did see it referenced in the [global] section 10.0.31.20
Code:
wallis@prox03:~$ sudo cat /etc/pve/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.0.31.16/28
fsid = 68dd8e32-f7b1-479b-a406-dd8db83fd50d
mon_allow_pool_delete = true
mon_host = 10.0.31.18 10.0.31.19 10.0.31.20
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.0.31.16/28
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.prox01]
host = prox01
mds_standby_for_name = pve
[mds.prox02]
host = prox02
mds_standby_for_name = pve
[mds.prox03]
host = prox03
mds_standby_for_name = pve
[mon.prox01]
public_addr = 10.0.31.18
[mon.prox02]
public_addr = 10.0.31.19
wallis@prox03:~$
Can i manually remove 10.0.30.21 from each cluster ceph.conf file?
Many thnaks