Issue with ceph-mon

xtriglavx

New Member
Aug 3, 2020
10
0
1
35
Hi, please can you somebody help me with issue on proxmox?

I have 3 nodes in cluster with CEPH RBD and 1 mon-ceph node is offline. Log on this pve shows this error:

Code:
2020-11-30 10:23:04.311 7f2033275400  0 mon.pve3@-1(???).osd e29 crush map has features 288514051259236352, adjusting msgr requires
2020-11-30 10:23:04.311 7f2033275400  0 mon.pve3@-1(???).osd e29 crush map has features 288514051259236352, adjusting msgr requires
2020-11-30 10:23:04.311 7f2033275400  0 mon.pve3@-1(???).osd e29 crush map has features 3314933000852226048, adjusting msgr requires
2020-11-30 10:23:04.311 7f2033275400  0 mon.pve3@-1(???).osd e29 crush map has features 288514051259236352, adjusting msgr requires
2020-11-30 10:23:04.311 7f2033275400  1 mon.pve3@-1(???).paxosservice(auth 251..334) refresh upgraded, format 0 -> 3
2020-11-30 10:23:04.319 7f2033275400  0 mon.pve3@-1(probing) e3  my rank is now 2 (was -1)
2020-11-30 10:23:12.315 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:12.543 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:12.551 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:22.991 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:24.175 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:24.179 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:24.179 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:32.320 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:32.533 7f2029d8a700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:32.533 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:34.370 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:42.594 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:42.598 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:44.552 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:44.560 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:44.560 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:52.726 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:52.734 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id

root@pve3:~# pveversion
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)
root@pve3:~# ceph -v
ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94) nautilus (stable

root@pve2:~# pveversion
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)
root@pve2:~# ceph -v
ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94) nautilus (stable)

root@pve1:~# pveversion
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)
root@pve1:~# ceph -v
ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94) nautilus (stable)
 
What does ceph -s show?
 
I dont know, because i tried reboot pve3 node, and then after this restart ceph and pve cluster was run without any error. But I don't understand, why this ceph node was in UNKNOWN state (in GUI).
 
But I don't understand, why this ceph node was in UNKNOWN state (in GUI).
Then I suppose it was not Ceph, more that the pve-cluster service didn't run. Please upgrade to the latest Nautilus release, there has been a fix for systemd service ordering. It could be that the node is hit by this cycle dependency and kills a service random.
 
Then I suppose it was not Ceph, more that the pve-cluster service didn't run. Please upgrade to the latest Nautilus release, there has been a fix for systemd service ordering. It could be that the node is hit by this cycle dependency and kills a service random.
Thanks for your answer, i will try it.