Hi,
my cluster is not healthy, means there were many slow request and unknown pgs.
Then I noticed an error message that spoiled the MGR log heavily:
2019-11-06 11:37:39.977 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ee7f600 0x56480eece000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.981 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ec36400 0x56480d654000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.981 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ee7f600 0x56480eece000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.985 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ec36400 0x56480d654000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.985 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ee7f600 0x56480eece000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
I immediately stopped MGR on the relevant node.
What happened is that the next standby MGR took over.
However, the same error message spoiled the log of this active node, too.
Then I decided to stop all other MGR services and all MON services.
When I restart the MON service sequentially on 4 nodes there's no problem.
However when I start just on MGR service, the log is spoiled again with the same error.
I could now stop all OSD services, however this will increase the health problems of the cluster again.
What is causing the error messages BADAUTHORIZER?
THX
my cluster is not healthy, means there were many slow request and unknown pgs.
Then I noticed an error message that spoiled the MGR log heavily:
2019-11-06 11:37:39.977 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ee7f600 0x56480eece000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.981 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ec36400 0x56480d654000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.981 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ee7f600 0x56480eece000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.985 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ec36400 0x56480d654000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2019-11-06 11:37:39.985 7f90028d7700 0 --1- 10.97.206.96:0/3948014004 >> v1:10.97.206.93:6918/101424 conn(0x56480ee7f600 0x56480eece000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
I immediately stopped MGR on the relevant node.
What happened is that the next standby MGR took over.
However, the same error message spoiled the log of this active node, too.
Then I decided to stop all other MGR services and all MON services.
When I restart the MON service sequentially on 4 nodes there's no problem.
However when I start just on MGR service, the log is spoiled again with the same error.
I could now stop all OSD services, however this will increase the health problems of the cluster again.
What is causing the error messages BADAUTHORIZER?
THX