hi there,
my proxmox cluster recently stopped working. after having a look into proxmox's webinterface its not 100% sure why vm's console view just endlessly (at least 10minutes) tries to connect to vm but ceph status cannot be reported so i assume ther's something wrong w/ the ceph setup that is used for storing vms.
ceph cluster is built on top of three physical proxmox servers: pve1, pve2 and pve3
running "ceph statuts" just "hangs" on all of the three nodes, it never returns to shell. it can be cancelled by ctrl+c.
in proxmox' webif ceph status times out after about 20seconds, w/ "got timeout (500)"
so, vms are currently down, ceph status is unclear.
there's one monitor setup (on pve1) that seems to be down. stopping / starting again results in same behaviour, please see:
---snip---
ceph-mon@pve1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Tue 2020-04-07 11:16:10 CEST; 48min ago
Process: 1487 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id pve1 --setuser ceph --setgroup ceph (code=exited, status=28)
Main PID: 1487 (code=exited, status=28)
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Apr 07 11:16:10 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Apr 07 11:16:10 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.
---snip---
is it a good idea to setup another monitor on another node/is it possible to setup another monitor in that state? do you have any idea how to get the monitor back up-and-running?
ceph-mon logs are empty/old (today is april 2020) ...
root@pve1:/var/log/ceph# ls -alh ceph-mo*
-rw-r--r-- 1 root ceph 0 Dec 12 00:00 ceph-mon.pve1.log
-rw-r--r-- 1 root ceph 3.8K Dec 10 00:24 ceph-mon.pve1.log.1.gz
root@pve1:/var/log/ceph#
any help would be great!
my proxmox cluster recently stopped working. after having a look into proxmox's webinterface its not 100% sure why vm's console view just endlessly (at least 10minutes) tries to connect to vm but ceph status cannot be reported so i assume ther's something wrong w/ the ceph setup that is used for storing vms.
ceph cluster is built on top of three physical proxmox servers: pve1, pve2 and pve3
running "ceph statuts" just "hangs" on all of the three nodes, it never returns to shell. it can be cancelled by ctrl+c.
in proxmox' webif ceph status times out after about 20seconds, w/ "got timeout (500)"
so, vms are currently down, ceph status is unclear.
there's one monitor setup (on pve1) that seems to be down. stopping / starting again results in same behaviour, please see:
---snip---
ceph-mon@pve1.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Tue 2020-04-07 11:16:10 CEST; 48min ago
Process: 1487 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id pve1 --setuser ceph --setgroup ceph (code=exited, status=28)
Main PID: 1487 (code=exited, status=28)
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Apr 07 11:16:10 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Apr 07 11:16:10 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Apr 07 11:16:10 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.
---snip---
is it a good idea to setup another monitor on another node/is it possible to setup another monitor in that state? do you have any idea how to get the monitor back up-and-running?
ceph-mon logs are empty/old (today is april 2020) ...
root@pve1:/var/log/ceph# ls -alh ceph-mo*
-rw-r--r-- 1 root ceph 0 Dec 12 00:00 ceph-mon.pve1.log
-rw-r--r-- 1 root ceph 3.8K Dec 10 00:24 ceph-mon.pve1.log.1.gz
root@pve1:/var/log/ceph#
any help would be great!