Issue with ceph-mon

xtriglavx

New Member
Aug 3, 2020
10
0
1
34
Hi, please can you somebody help me with issue on proxmox?

I have 3 nodes in cluster with CEPH RBD and 1 mon-ceph node is offline. Log on this pve shows this error:

Code:
2020-11-30 10:23:04.311 7f2033275400  0 mon.pve3@-1(???).osd e29 crush map has features 288514051259236352, adjusting msgr requires
2020-11-30 10:23:04.311 7f2033275400  0 mon.pve3@-1(???).osd e29 crush map has features 288514051259236352, adjusting msgr requires
2020-11-30 10:23:04.311 7f2033275400  0 mon.pve3@-1(???).osd e29 crush map has features 3314933000852226048, adjusting msgr requires
2020-11-30 10:23:04.311 7f2033275400  0 mon.pve3@-1(???).osd e29 crush map has features 288514051259236352, adjusting msgr requires
2020-11-30 10:23:04.311 7f2033275400  1 mon.pve3@-1(???).paxosservice(auth 251..334) refresh upgraded, format 0 -> 3
2020-11-30 10:23:04.319 7f2033275400  0 mon.pve3@-1(probing) e3  my rank is now 2 (was -1)
2020-11-30 10:23:12.315 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:12.543 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:12.551 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:22.991 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:24.175 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:24.179 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:24.179 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:32.320 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:32.533 7f2029d8a700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:32.533 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:34.370 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:42.594 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:42.598 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:44.552 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:44.560 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:44.560 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:52.726 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id
2020-11-30 10:23:52.734 7f202ed94700  1 mon.pve3@2(probing) e3 handle_auth_request failed to assign global_id

root@pve3:~# pveversion
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)
root@pve3:~# ceph -v
ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94) nautilus (stable

root@pve2:~# pveversion
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)
root@pve2:~# ceph -v
ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94) nautilus (stable)

root@pve1:~# pveversion
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)
root@pve1:~# ceph -v
ceph version 14.2.11 (21626754f4563baadc6ba5d50b9cbc48a5730a94) nautilus (stable)
 
What does ceph -s show?
 
I dont know, because i tried reboot pve3 node, and then after this restart ceph and pve cluster was run without any error. But I don't understand, why this ceph node was in UNKNOWN state (in GUI).
 
But I don't understand, why this ceph node was in UNKNOWN state (in GUI).
Then I suppose it was not Ceph, more that the pve-cluster service didn't run. Please upgrade to the latest Nautilus release, there has been a fix for systemd service ordering. It could be that the node is hit by this cycle dependency and kills a service random.
 
Then I suppose it was not Ceph, more that the pve-cluster service didn't run. Please upgrade to the latest Nautilus release, there has been a fix for systemd service ordering. It could be that the node is hit by this cycle dependency and kills a service random.
Thanks for your answer, i will try it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!