Error message spoils MGR log: connect got BADAUTHORIZER

cmonty14 · Nov 7, 2019

Hi,
my cluster is not healthy, means there were many slow request and unknown pgs.
Then I noticed an error message that spoiled the MGR log heavily:
2019-11-06 11:37:39.977 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ee7f600 0x56480eece000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.981 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ec36400 0x56480d654000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.981 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ee7f600 0x56480eece000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.985 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ec36400 0x56480d654000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.985 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ee7f600 0x56480eece000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER

I immediately stopped MGR on the relevant node.
What happened is that the next standby MGR took over.
However, the same error message spoiled the log of this active node, too.

Then I decided to stop all other MGR services and all MON services.

When I restart the MON service sequentially on 4 nodes there's no problem.
However when I start just on MGR service, the log is spoiled again with the same error.

I could now stop all OSD services, however this will increase the health problems of the cluster again.

What is causing the error messages BADAUTHORIZER?

THX

t.lamprecht · Nov 8, 2019

And the nodes are all time-synced?

Else this could also be related to this: https://forum.proxmox.com/threads/attention-potential-bug-in-ceph-identified.59904/#post-276246

cmonty14 · Nov 8, 2019

Hi Thomas,

the nodes are time-synced.

Currently I assume that this error is related to the potential bug with MGR.
In order to resolve it I have installed updated ceph packages incl. ceph-mgr provided by some developer.
Since then my cluster recovered from unhealthy state and is now back to normal operations.

Regards

Search

Search

Error message spoils MGR log: connect got BADAUTHORIZER

cmonty14

Well-Known Member

t.lamprecht

Proxmox Staff Member

cmonty14

Well-Known Member