Hi, i have issue with ceph mon on one node few weeks after update to 17.27 version. Below logs from Syslog Mon.
once every few days the service stops and I have to start it manually. It works fine for a few days and then these errors come back.
The system disk has a lot of free space. It is in hardware RAID 1, SMART on the disks is correct.
Logs
Aug 20 01:09:03 pve13.xxx systemd[1]: Started Ceph cluster monitor daemon.
Aug 20 01:09:03 pve13.xxx ceph-mon[594532]: problem writing to /var/log/ceph/ceph-mon.pve13.log: (28) No space left on device
Aug 20 01:09:03 pve13.xxx ceph-mon[594532]: 2024-08-20T01:09:03.966+0200 7f1c89e2da00 -1 error: monitor data filesystem reached concerning levels of available storage space (available: 0% 0 B)
Aug 20 01:09:03 pve13.xxx ceph-mon[594532]: you may adjust 'mon data avail crit' to a lower value to make this go away (default: 5%)
Aug 20 01:09:03 pve13.xxx systemd[1]: ceph-mon@pve13.service: Main process exited, code=exited, status=28/n/a
Aug 20 01:09:03 pve13.xxx systemd[1]: ceph-mon@pve13.service: Failed with result 'exit-code'.
Aug 20 01:09:14 pve13.xxx systemd[1]: ceph-mon@pve13.service: Scheduled restart job, restart counter is at 6.
Aug 20 01:09:14 pve13.xxx systemd[1]: Stopped Ceph cluster monitor daemon.
Aug 20 01:09:14 pve13.xxx systemd[1]: ceph-mon@pve13.service: Start request repeated too quickly.
Aug 20 01:09:14 pve13.xxx systemd[1]: ceph-mon@pve13.service: Failed with result 'exit-code'.
Aug 20 01:09:14 pve13.xxx systemd[1]: Failed to start Ceph cluster monitor daemon.
Aug 25 19:47:32 pve13.xxx systemd[1]: Started Ceph cluster monitor daemon.
Aug 25 19:47:32 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:32.543+0200 7f5b17e67700 -1 mon.pve13@-1(???) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Aug 25 19:47:32 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:32.743+0200 7f5b17e67700 -1 mon.pve13@2(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Aug 25 19:47:33 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:33.147+0200 7f5b17e67700 -1 mon.pve13@2(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Aug 25 19:47:33 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:33.947+0200 7f5b17e67700 -1 mon.pve13@2(synchronizing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
once every few days the service stops and I have to start it manually. It works fine for a few days and then these errors come back.
The system disk has a lot of free space. It is in hardware RAID 1, SMART on the disks is correct.
Logs
Aug 20 01:09:03 pve13.xxx systemd[1]: Started Ceph cluster monitor daemon.
Aug 20 01:09:03 pve13.xxx ceph-mon[594532]: problem writing to /var/log/ceph/ceph-mon.pve13.log: (28) No space left on device
Aug 20 01:09:03 pve13.xxx ceph-mon[594532]: 2024-08-20T01:09:03.966+0200 7f1c89e2da00 -1 error: monitor data filesystem reached concerning levels of available storage space (available: 0% 0 B)
Aug 20 01:09:03 pve13.xxx ceph-mon[594532]: you may adjust 'mon data avail crit' to a lower value to make this go away (default: 5%)
Aug 20 01:09:03 pve13.xxx systemd[1]: ceph-mon@pve13.service: Main process exited, code=exited, status=28/n/a
Aug 20 01:09:03 pve13.xxx systemd[1]: ceph-mon@pve13.service: Failed with result 'exit-code'.
Aug 20 01:09:14 pve13.xxx systemd[1]: ceph-mon@pve13.service: Scheduled restart job, restart counter is at 6.
Aug 20 01:09:14 pve13.xxx systemd[1]: Stopped Ceph cluster monitor daemon.
Aug 20 01:09:14 pve13.xxx systemd[1]: ceph-mon@pve13.service: Start request repeated too quickly.
Aug 20 01:09:14 pve13.xxx systemd[1]: ceph-mon@pve13.service: Failed with result 'exit-code'.
Aug 20 01:09:14 pve13.xxx systemd[1]: Failed to start Ceph cluster monitor daemon.
Aug 25 19:47:32 pve13.xxx systemd[1]: Started Ceph cluster monitor daemon.
Aug 25 19:47:32 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:32.543+0200 7f5b17e67700 -1 mon.pve13@-1(???) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Aug 25 19:47:32 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:32.743+0200 7f5b17e67700 -1 mon.pve13@2(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Aug 25 19:47:33 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:33.147+0200 7f5b17e67700 -1 mon.pve13@2(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Aug 25 19:47:33 pve13.xxx ceph-mon[2859316]: 2024-08-25T19:47:33.947+0200 7f5b17e67700 -1 mon.pve13@2(synchronizing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied