One of Cephmon is down

Ayush

Member
Oct 27, 2023
81
2
8
Hi Team ,

We have a cluster of 3 hosts and we observe that one of the ceph-mon is down, and give following error : -

systemctl status ceph-mon@ge172
× ceph-mon@ge172.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Thu 2024-12-19 12:04:46 IST; 5h 51min ago
Duration: 57ms
Process: 1954266 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id ge172 --setuser ceph --setgroup ceph (code=exited, status=28)
Main PID: 1954266 (code=exited, status=28)
CPU: 56ms

Dec 19 12:04:46 ge172 systemd[1]: ceph-mon@ge172.service: Scheduled restart job, restart counter is at 6.
Dec 19 12:04:46 ge172 systemd[1]: Stopped ceph-mon@ge172.service - Ceph cluster monitor daemon.
Dec 19 12:04:46 ge172 systemd[1]: ceph-mon@ge172.service: Start request repeated too quickly.
Dec 19 12:04:46 ge172 systemd[1]: ceph-mon@ge172.service: Failed with result 'exit-code'.
Dec 19 12:04:46 ge172 systemd[1]: Failed to start ceph-mon@ge172.service - Ceph cluster monitor daemon.


ceph status
cluster:
id: 47061c54-d430-47c6-afa6-952da8e88877
health: HEALTH_WARN
1/3 mons down, quorum ge171,ge173

services:
mon: 3 daemons, quorum ge171,ge173 (age 5h), out of quorum: ge172
mgr: ge172(active, since 9M), standbys:ge173, ge171
osd: 9 osds: 9 up (since 8M), 9 in (since 8M)

data:
pools: 3 pools, 161 pgs
objects: 984.80k objects, 3.7 TiB
usage: 11 TiB used, 16 TiB / 27 TiB avail
pgs: 160 active+clean
1 active+clean+scrubbing+deep

io:
client: 182 KiB/s rd, 13 MiB/s wr, 12 op/s rd, 191 op/s wr


After restart of ceph mon process I am getting following messages in systemctl status ceph-mon:

e172 ceph-mon[2450970]: 2024-12-19T17:56:30.740+0530 7fb1a702a6c0 -1 mon.ge172@0(probing) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
ge172 ceph-mon[2450970]: 2024-12-19T17:56:31.140+0530 7fb1a702a6c0 -1 mon.ge172@0(probing) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
ge172 ceph-mon[2450970]: 2024-12-19T17:56:31.940+0530 7fb1a702a6c0 -1 mon.ge172@0(probing) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
ge172 ceph-mon[2450970]: 2024-12-19T17:56:33.540+0530 7fb1a702a6c0 -1 mon.ge172@0(probing) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
ge172 ceph-mon[2450970]: 2024-12-19T17:56:36.740+0530 7fb1a702a6c0 -1 mon.ge172@0(probing) e3 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!