[SOLVED] ceph status times out

mistay

Member
Apr 7, 2020
12
0
6
55
hi there,

my proxmox cluster recently stopped working. after having a look into proxmox's webinterface its not 100% sure why vm's console view just endlessly (at least 10minutes) tries to connect to vm but ceph status cannot be reported so i assume ther's something wrong w/ the ceph setup that is used for storing vms.

ceph cluster is built on top of three physical proxmox servers: pve1, pve2 and pve3

running "ceph statuts" just "hangs" on all of the three nodes, it never returns to shell. it can be cancelled by ctrl+c.
in proxmox' webif ceph status times out after about 20seconds, w/ "got timeout (500)"

so, vms are currently down, ceph status is unclear.

there's one monitor setup (on pve1) that seems to be down. stopping / starting again results in same behaviour, please see:

---snip---
ceph-mon@pve1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Tue 2020-04-07 11:16:10 CEST; 48min ago
Process: 1487 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id pve1 --setuser ceph --setgroup ceph (code=exited, status=28)
Main PID: 1487 (code=exited, status=28)

Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Apr 07 11:16:10 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Apr 07 11:16:10 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.
---snip---

is it a good idea to setup another monitor on another node/is it possible to setup another monitor in that state? do you have any idea how to get the monitor back up-and-running?

ceph-mon logs are empty/old (today is april 2020) ...
root@pve1:/var/log/ceph# ls -alh ceph-mo*
-rw-r--r-- 1 root ceph 0 Dec 12 00:00 ceph-mon.pve1.log
-rw-r--r-- 1 root ceph 3.8K Dec 10 00:24 ceph-mon.pve1.log.1.gz
root@pve1:/var/log/ceph#

any help would be great!
 
fixed. free space was below 1G, ceph-mon needed more to start. freed up some diskspace be removing images and rebooted machine...