Hi,
I'm running a proxmox cluster 4.4-12 for a while with 3 nodes, each having 2 osd and running a monitor. Since a few days I have one monitor down on one node and I do not understand how to track the problem. Nothing obvious for me in the /var/log/ceph . I've no recent informations in ceph-mon.log (timestamp is 2017 for this file). My ceph.log file is also old (2 days ago).
Node reboot do not solve the problem.
On this node I have a ceph-mon@2.service (2 is the id of the failing monitor)
Thanks for your advices
I'm running a proxmox cluster 4.4-12 for a while with 3 nodes, each having 2 osd and running a monitor. Since a few days I have one monitor down on one node and I do not understand how to track the problem. Nothing obvious for me in the /var/log/ceph . I've no recent informations in ceph-mon.log (timestamp is 2017 for this file). My ceph.log file is also old (2 days ago).
Node reboot do not solve the problem.
Code:
# ceph -s
cluster 602c5599-4cb8-4f19-8c46-44bea575d6e0
health HEALTH_WARN
1 mons down, quorum 0,1 0,1
monmap e3: 3 mons at {0=192.168.20.5:6789/0,1=192.168.20.6:6789/0,2=192.168.20.7:6789/0}
election epoch 332, quorum 0,1 0,1
osdmap e1385: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v55074580: 64 pgs, 1 pools, 476 GB data, 119 kobjects
1429 GB used, 43240 GB / 44670 GB avail
64 active+clean
client io 9974 B/s wr, 0 op/s rd, 0 op/s wr
On this node I have a ceph-mon@2.service (2 is the id of the failing monitor)
Code:
# systemctl status ceph-mon@2.service
● ceph-mon@2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled)
Drop-In: /lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: start-limit) since Mon 2019-11-18 09:47:59 CET; 1min 6s ago
Process: 3415 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=28)
Main PID: 3415 (code=exited, status=28)
Nov 18 09:47:49 proxmost3 systemd[1]: Unit ceph-mon@2.service entered failed state.
Nov 18 09:47:59 proxmost3 systemd[1]: ceph-mon@2.service holdoff time over, scheduling restart.
Nov 18 09:47:59 proxmost3 systemd[1]: Stopping Ceph cluster monitor daemon...
Nov 18 09:47:59 proxmost3 systemd[1]: Starting Ceph cluster monitor daemon...
Nov 18 09:47:59 proxmost3 systemd[1]: ceph-mon@2.service start request repeated too quickly, refusing to start.
Nov 18 09:47:59 proxmost3 systemd[1]: Failed to start Ceph cluster monitor daemon.
Nov 18 09:47:59 proxmost3 systemd[1]: Unit ceph-mon@2.service entered failed state.
Thanks for your advices