One of Cephmon is down

Ayush · Thursday at 13:47

Hi Team ,

We have a cluster of 3 hosts and we observe that one of the ceph-mon is down, and give following error : -

systemctl status ceph-mon@ge172
× ceph-mon@ge172.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Thu 2024-12-19 12:04:46 IST; 5h 51min ago
Duration: 57ms
Process: 1954266 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id ge172 --setuser ceph --setgroup ceph (code=exited, status=28)
Main PID: 1954266 (code=exited, status=28)
CPU: 56ms

Dec 19 12:04:46 ge172 systemd[1]: ceph-mon@ge172.service: Scheduled restart job, restart counter is at 6.
Dec 19 12:04:46 ge172 systemd[1]: Stopped ceph-mon@ge172.service - Ceph cluster monitor daemon.
Dec 19 12:04:46 ge172 systemd[1]: ceph-mon@ge172.service: Start request repeated too quickly.
Dec 19 12:04:46 ge172 systemd[1]: ceph-mon@ge172.service: Failed with result 'exit-code'.
Dec 19 12:04:46 ge172 systemd[1]: Failed to start ceph-mon@ge172.service - Ceph cluster monitor daemon.

ceph status
cluster:
id: 47061c54-d430-47c6-afa6-952da8e88877
health: HEALTH_WARN
1/3 mons down, quorum ge171,ge173

services:
mon: 3 daemons, quorum ge171,ge173 (age 5h), out of quorum: ge172
mgr: ge172(active, since 9M), standbys:ge173, ge171
osd: 9 osds: 9 up (since 8M), 9 in (since 8M)

data:
pools: 3 pools, 161 pgs
objects: 984.80k objects, 3.7 TiB
usage: 11 TiB used, 16 TiB / 27 TiB avail
pgs: 160 active+clean
1 active+clean+scrubbing+deep

io:
client: 182 KiB/s rd, 13 MiB/s wr, 12 op/s rd, 191 op/s wr

After restart of ceph mon process I am getting following messages in systemctl status ceph-mon:

e172 ceph-mon[2450970]: 2024-12-19T17:56:30.740+0530 7fb1a702a6c0 -1 mon.ge172@0(probing) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
ge172 ceph-mon[2450970]: 2024-12-19T17:56:31.140+0530 7fb1a702a6c0 -1 mon.ge172@0(probing) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
ge172 ceph-mon[2450970]: 2024-12-19T17:56:31.940+0530 7fb1a702a6c0 -1 mon.ge172@0(probing) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
ge172 ceph-mon[2450970]: 2024-12-19T17:56:33.540+0530 7fb1a702a6c0 -1 mon.ge172@0(probing) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
ge172 ceph-mon[2450970]: 2024-12-19T17:56:36.740+0530 7fb1a702a6c0 -1 mon.ge172@0(probing) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied

Search

Search

One of Cephmon is down

Ayush

Member