CEPH MON issue on one node

mojsiuk

Active Member
Feb 10, 2019
12
1
43
45
Poland
Hi, i have issue with ceph mon on one node few weeks after update to 17.27 version. Below logs from Syslog Mon.
once every few days the service stops and I have to start it manually. It works fine for a few days and then these errors come back.
The system disk has a lot of free space. It is in hardware RAID 1, SMART on the disks is correct.

Logs
Aug 20 01:09:03 pve13.xxx systemd[1]: Started Ceph cluster monitor daemon.
Aug 20 01:09:03 pve13.xxx ceph-mon[594532]: problem writing to /var/log/ceph/ceph-mon.pve13.log: (28) No space left on device
Aug 20 01:09:03 pve13.xxx ceph-mon[594532]: 2024-08-20T01:09:03.966+0200 7f1c89e2da00 -1 error: monitor data filesystem reached concerning levels of available storage space (available: 0% 0 B)
Aug 20 01:09:03 pve13.xxx ceph-mon[594532]: you may adjust 'mon data avail crit' to a lower value to make this go away (default: 5%)
Aug 20 01:09:03 pve13.xxx systemd[1]: ceph-mon@pve13.service: Main process exited, code=exited, status=28/n/a
Aug 20 01:09:03 pve13.xxx systemd[1]: ceph-mon@pve13.service: Failed with result 'exit-code'.
Aug 20 01:09:14 pve13.xxx systemd[1]: ceph-mon@pve13.service: Scheduled restart job, restart counter is at 6.
Aug 20 01:09:14 pve13.xxx systemd[1]: Stopped Ceph cluster monitor daemon.
Aug 20 01:09:14 pve13.xxx systemd[1]: ceph-mon@pve13.service: Start request repeated too quickly.
Aug 20 01:09:14 pve13.xxx systemd[1]: ceph-mon@pve13.service: Failed with result 'exit-code'.
Aug 20 01:09:14 pve13.xxx systemd[1]: Failed to start Ceph cluster monitor daemon.
Aug 25 19:47:32 pve13.xxx systemd[1]: Started Ceph cluster monitor daemon.
Aug 25 19:47:32 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:32.543+0200 7f5b17e67700 -1 mon.pve13@-1(???) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Aug 25 19:47:32 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:32.743+0200 7f5b17e67700 -1 mon.pve13@2(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Aug 25 19:47:33 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:33.147+0200 7f5b17e67700 -1 mon.pve13@2(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Aug 25 19:47:33 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:33.947+0200 7f5b17e67700 -1 mon.pve13@2(synchronizing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
 
root@pve13:~# df -hT /var/log/ceph/
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/pve-root ext4 94G 9.7G 80G 11% /
root@pve13:~#
 
PVE updated and restarted, still same issue with mon on one node pve13. Disk is almost free, with 11% used space. Any ideas?
Update: Attached logs form journalctl --since '1 week ago' -u ceph-mon@pve13 >> ceph-mon_log.txt
 

Attachments

  • Aug 28 004543 pve13.xxx ceph-mon[17.txt
    59.7 KB · Views: 1
  • ceph-mon_log.txt
    249.1 KB · Views: 0
Last edited:
The "out of space" message is confusing.

But before that it tells that it cannot auth with the other MONs.

Is the cluster healthy except for this one MON?

I would then just throw it away and create a new MON instance.
When mon service i active cluster is healthy. The problem starts after update Ceph to Quincy (update from Proxmox solution, before PVE update to version 8 from 7.
 
When mon service i active cluster is healthy. The problem starts after update Ceph to Quincy (update from Proxmox solution, before PVE update to version 8 from 7.
and ceph -s log:
cluster:
id: bf79845c-f78b-4b28-8bf9-85fb8d320a38
health: HEALTH_OK

services:
mon: 3 daemons, quorum pve11,pve12,pve13 (age 10h)
mgr: pve11(active, since 5w), standbys: pve12, pve13
osd: 18 osds: 18 up (since 4d), 18 in (since 13M)

data:
pools: 3 pools, 545 pgs
objects: 598.68k objects, 2.3 TiB
usage: 6.7 TiB used, 17 TiB / 24 TiB avail
pgs: 545 active+clean

io:
client: 1.2 MiB/s rd, 2.6 MiB/s wr, 32 op/s rd, 121 op/s wr
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!