ceph mon failing start after node failure

driici · Mar 28, 2022

Hi, I have 3 node ceph cluster (managed via proxmox). Got single node fatal failure and replaced it. Os boots correctly, however monitor on failed node did not start successfully; Other 2 monitors are OK, ceph status is healthy:

ceph -s
cluster:
id: 845868a1-9902-4b61-aa06-0767cb09f1c2
health: HEALTH_OK

services:
mon: 2 daemons, quorum pxmx1,pxmx3 (age 2h)
mgr: pxmx1(active, since 56m), standbys: pxmx3
osd: 18 osds: 18 up (since 111m), 18 in (since 3h)

data:
pools: 1 pools, 256 pgs
objects: 2.12M objects, 8.1 TiB
usage: 24 TiB used, 21 TiB / 45 TiB avail
pgs: 256 active+clean

content of ceph.conf

[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.60.10.1/24
fsid = 845868a1-9902-4b61-aa06-0767cb09f1c2
mon_allow_pool_delete = true
mon_host = 10.60.10.1 10.60.10.3 10.60.10.2
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.60.10.1/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring

Monitor is failing (at least as I understand the problem) with following logged error:

mon.pxmx2@-1(probing) e4 handle_auth_request failed to assign global_id

whole mon log attached.

I have tried to scrap dead monitor and recreate it via proxmoxes gui, shell and even have created content /var/lib/ceph/mon/ manually and tried to run monitor from terminal. It starts, listens to connections on port 3300 and 6789, but does not communicate properly with other remaining mons.

thanks for info

Tomas Hodek

ceph mon failing start after node failure

driici

Member

We value your privacy