[SOLVED] Ceph HEALTH_WARN,1 mons down

samontetro

Active Member
Jun 19, 2012
78
2
28
Grenoble, France
Hi,
I'm running a proxmox cluster 4.4-12 for a while with 3 nodes, each having 2 osd and running a monitor. Since a few days I have one monitor down on one node and I do not understand how to track the problem. Nothing obvious for me in the /var/log/ceph . I've no recent informations in ceph-mon.log (timestamp is 2017 for this file). My ceph.log file is also old (2 days ago).
Node reboot do not solve the problem.

Code:
# ceph -s
    cluster 602c5599-4cb8-4f19-8c46-44bea575d6e0
     health HEALTH_WARN
            1 mons down, quorum 0,1 0,1
     monmap e3: 3 mons at {0=192.168.20.5:6789/0,1=192.168.20.6:6789/0,2=192.168.20.7:6789/0}
            election epoch 332, quorum 0,1 0,1
     osdmap e1385: 6 osds: 6 up, 6 in
            flags sortbitwise,require_jewel_osds
      pgmap v55074580: 64 pgs, 1 pools, 476 GB data, 119 kobjects
            1429 GB used, 43240 GB / 44670 GB avail
                  64 active+clean
  client io 9974 B/s wr, 0 op/s rd, 0 op/s wr


On this node I have a ceph-mon@2.service (2 is the id of the failing monitor)
Code:
# systemctl status ceph-mon@2.service
● ceph-mon@2.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled)
  Drop-In: /lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: failed (Result: start-limit) since Mon 2019-11-18 09:47:59 CET; 1min 6s ago
  Process: 3415 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=28)
 Main PID: 3415 (code=exited, status=28)

Nov 18 09:47:49 proxmost3 systemd[1]: Unit ceph-mon@2.service entered failed state.
Nov 18 09:47:59 proxmost3 systemd[1]: ceph-mon@2.service holdoff time over, scheduling restart.
Nov 18 09:47:59 proxmost3 systemd[1]: Stopping Ceph cluster monitor daemon...
Nov 18 09:47:59 proxmost3 systemd[1]: Starting Ceph cluster monitor daemon...
Nov 18 09:47:59 proxmost3 systemd[1]: ceph-mon@2.service start request repeated too quickly, refusing to start.
Nov 18 09:47:59 proxmost3 systemd[1]: Failed to start Ceph cluster monitor daemon.
Nov 18 09:47:59 proxmost3 systemd[1]: Unit ceph-mon@2.service entered failed state.

Thanks for your advices
 
  • Like
Reactions: samontetro
Thanks Alwin for these wise advices. They solve my problem.

  1. on one of the 3 cluster nodes / was nearly full because there were some local VM dumps in /var/lib/vz/dump (same physical partition). I had identified this (it was the main difference between the 3 nodes). But Partition was not full (96%, near 2GB available). Removing backup files to reach 40% of free space (16Gb) was not sufficient to solve the problem with a reboot.
  2. Looking in syslog on my node proxmost3 was a good idea, there was a message:
    Code:
    Nov 18 09:47:49 proxmost3 ceph-mon[3415]: error: monitor data filesystem reached concerning levels of available storage space (available: 4% 1830 MB
    Nov 18 09:47:49 proxmost3 ceph-mon[3415]: you may adjust 'mon data avail crit' to a lower value to make this go away (default: 5%)
    Nov 18 09:47:49 proxmost3 systemd[1]: ceph-mon@2.service: main process exited, code=exited, status=28/n/a
    Nov 18 09:47:49 proxmost3 systemd[1]: Unit ceph-mon@2.service entered failed state.
    This confirm my problem was coming from the available storage
  3. systemctl reset-failed ceph-mon@2.service is necessary, after restoring available storage in /var, to allow the monitor to start again. I was not aware of this command.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!