I have A ceph on 3 node working for a year.
I get a HEALTH_WARN about :
2 OSD have spurious read erros
1/3 mons down, quorum ceph01,ceph03
I tried to start mon on ceph02. But not working.
And I do have some google about debug it.
And tried:
Non of them have any output and no respon.
My systeam disk still have space and HEALTH is OK no error.
It's looks like the folder store for mon on this node have some issue.
Should I rm it. Or just reboot the node?
I get a HEALTH_WARN about :
2 OSD have spurious read erros
1/3 mons down, quorum ceph01,ceph03
I tried to start mon on ceph02. But not working.
Code:
xxxxxxx@ceph02:~# systemctl status ceph-mon@ceph02
● ceph-mon@ceph02.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: active (running) since Sat 2024-02-03 12:27:49 CST; 5 months 12 days ago
Main PID: 1450 (ceph-mon)
Tasks: 24
Memory: 3.4G
CPU: 2w 4d 14h 10min 5.925s
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@ceph02.service
└─1450 /usr/bin/ceph-mon -f --cluster ceph --id ceph02 --setuser ceph --setgroup ceph
Jul 17 12:17:16 ceph02 ceph-mon[1450]: 2024-07-17T12:17:16.574+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Jul 17 12:17:31 ceph02 ceph-mon[1450]: 2024-07-17T12:17:31.590+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Jul 17 12:17:46 ceph02 ceph-mon[1450]: 2024-07-17T12:17:46.603+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Jul 17 12:18:01 ceph02 ceph-mon[1450]: 2024-07-17T12:18:01.615+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Jul 17 12:18:16 ceph02 ceph-mon[1450]: 2024-07-17T12:18:16.627+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Jul 17 12:18:31 ceph02 ceph-mon[1450]: 2024-07-17T12:18:31.644+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Jul 17 12:18:46 ceph02 ceph-mon[1450]: 2024-07-17T12:18:46.660+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Jul 17 12:1
9:01 ceph02 ceph-mon[1450]: 2024-07-17T12:19:01.672+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Jul 17 12:19:16 ceph02 ceph-mon[1450]: 2024-07-17T12:19:16.685+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Jul 17 12:19:31 ceph02 ceph-mon[1450]: 2024-07-17T12:19:31.697+0800 7f1ccdd33700 -1 mon.ceph02@1(peon) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
And I do have some google about debug it.
Code:
xxxxxx@ceph02:~# ceph tell mon.1 mon_status
Error ENXIO: problem getting command descriptions from mon.1
And tried:
Code:
sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph02.asok mon_status
ceph-mon -i ceph02 --debug_mon 10
ls /var/lib/ceph/mon/ceph-ceph02/
Non of them have any output and no respon.
My systeam disk still have space and HEALTH is OK no error.
It's looks like the folder store for mon on this node have some issue.
Should I rm it. Or just reboot the node?
Last edited: