PVE Ceph monitor doesn't start

Senin

Member
Jan 8, 2023
29
7
8
Hi!

I've got 3 node cluster.
Today pve2 node suddenly restarted and after that Ceph monitor doesn't start although all serices are ok.

The event log is full of the following error:
024-02-01T18:22:24.830+0300 7f5395c3b700 1 mon.pve2@1(probing) e3 handle_auth_request failed to assign global_id
2024-02-01T18:22:24.834+0300 7f5395c3b700 1 mon.pve2@1(probing) e3 handle_auth_request failed to assign global_id
2024-02-01T18:22:24.842+0300 7f5395c3b700 1 mon.pve2@1(probing) e3 handle_auth_request failed to assign global_id

ceph-mon@pve2.service is up and running

Ceph v17.2.6

Any way to fix this?
Is it safe to destroy mon and create it again?
 
I tried to destroy an create problematic monitor , but now it's unable to join hte cluster:

2024-02-01T19:04:37.259+0300 7ff169c11700 1 mon.pve2@-1(probing) e4 handle_auth_request failed to assign global_id
2024-02-01T19:04:39.907+0300 7ff169c11700 1 mon.pve2@-1(probing) e4 handle_auth_request failed to assign global_id
2024-02-01T19:04:43.367+0300 7ff169c11700 1 mon.pve2@-1(probing) e4 handle_auth_request failed to assign global_id

it also reports "no such monitor id 'pve2' (500)" If I try to detroy monitor now.
 
ok, that was very helpful, but my case was much easier.

I performed query of admin port with "ceph --admin-daemon /var/run/ceph/ceph-mon.pve3.asok mon_status" on each node and noticed that
pve3 has a record
"extra_probe_peers": [
{
"addrvec": [
{
"type": "v2",
"addr": "192.168.100.220:3300",
"nonce": 0
},
{
"type": "v1",
"addr": "192.168.100.220:6789",
"nonce": 0
}
]
}

and pve1 has it empty
"extra_probe_peers": [],

So I just restarted monitor on pve1 and monitor on pve2 started immediately.

Thanks for help.