[SOLVED] ceph status times out

mistay

Member
Apr 7, 2020
12
0
6
54
hi there,

my proxmox cluster recently stopped working. after having a look into proxmox's webinterface its not 100% sure why vm's console view just endlessly (at least 10minutes) tries to connect to vm but ceph status cannot be reported so i assume ther's something wrong w/ the ceph setup that is used for storing vms.

ceph cluster is built on top of three physical proxmox servers: pve1, pve2 and pve3

running "ceph statuts" just "hangs" on all of the three nodes, it never returns to shell. it can be cancelled by ctrl+c.
in proxmox' webif ceph status times out after about 20seconds, w/ "got timeout (500)"

so, vms are currently down, ceph status is unclear.

there's one monitor setup (on pve1) that seems to be down. stopping / starting again results in same behaviour, please see:

---snip---
ceph-mon@pve1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Tue 2020-04-07 11:16:10 CEST; 48min ago
Process: 1487 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id pve1 --setuser ceph --setgroup ceph (code=exited, status=28)
Main PID: 1487 (code=exited, status=28)

Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Apr 07 11:16:10 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Apr 07 11:16:10 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.
---snip---

is it a good idea to setup another monitor on another node/is it possible to setup another monitor in that state? do you have any idea how to get the monitor back up-and-running?

ceph-mon logs are empty/old (today is april 2020) ...
root@pve1:/var/log/ceph# ls -alh ceph-mo*
-rw-r--r-- 1 root ceph 0 Dec 12 00:00 ceph-mon.pve1.log
-rw-r--r-- 1 root ceph 3.8K Dec 10 00:24 ceph-mon.pve1.log.1.gz
root@pve1:/var/log/ceph#

any help would be great!
 
fixed. free space was below 1G, ceph-mon needed more to start. freed up some diskspace be removing images and rebooted machine...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!