Hello all,
I walked in this morning to find problems.
My config consists of a 4 Node PVE cluster which has 3 Nodes of ceph storage
Ceph is inaccessible from the webgui
pve01 has corosync errors.
ceph status command will not run on the cli across the 3 ceph nodes.
I'm not sure if one issue caused the other or if it's just a coincidence.
Any help would be appreciated!
I walked in this morning to find problems.
My config consists of a 4 Node PVE cluster which has 3 Nodes of ceph storage
Ceph is inaccessible from the webgui
pve01 has corosync errors.
ceph status command will not run on the cli across the 3 ceph nodes.
I'm not sure if one issue caused the other or if it's just a coincidence.
Any help would be appreciated!
Code:
systemctl status corosync.service -l
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: failed (Result: resources)
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Aug 24 09:57:26 pve01 systemd[1]: Failed to start Corosync Cluster Engine.
Aug 24 10:14:12 pve01 systemd[1]: corosync.service: Failed to run 'start' task: No space left on device
Aug 24 10:14:12 pve01 systemd[1]: corosync.service: Failed with result 'resources'.
Aug 24 10:14:12 pve01 systemd[1]: Failed to start Corosync Cluster Engine.
Aug 24 10:15:36 pve01 systemd[1]: corosync.service: Failed to run 'start' task: No space left on device
Aug 24 10:15:36 pve01 systemd[1]: corosync.service: Failed with result 'resources'.
Aug 24 10:15:36 pve01 systemd[1]: Failed to start Corosync Cluster Engine.
Aug 24 10:49:46 pve01 systemd[1]: corosync.service: Failed to run 'start' task: No space left on device
Aug 24 10:49:46 pve01 systemd[1]: corosync.service: Failed with result 'resources'.
Aug 24 10:49:46 pve01 systemd[1]: Failed to start Corosync Cluster Engine.
Code:
journalctl -xn
-- Logs begin at Mon 2020-08-24 09:57:20 CDT, end at Mon 2020-08-24 11:00:01 CDT. --
Aug 24 10:59:46 pve01 pvestatd[5779]: got timeout
Aug 24 10:59:46 pve01 pvestatd[5779]: status update time (5.373 seconds)
Aug 24 10:59:55 pve01 snmpd[1469]: error on subcontainer 'ia_addr' insert (-1)
Aug 24 10:59:56 pve01 pvestatd[5779]: got timeout
Aug 24 10:59:56 pve01 pvestatd[5779]: status update time (5.376 seconds)
Aug 24 11:00:00 pve01 systemd[1]: Starting Proxmox VE replication runner...
-- Subject: A start job for unit pvesr.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvesr.service has begun execution.
--
-- The job identifier is 6792.
Aug 24 11:00:01 pve01 pvesr[16026]: unable to open file '/var/lib/pve-manager/pve-replication-state.json.tmp.16026' - No space left
Aug 24 11:00:01 pve01 systemd[1]: pvesr.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit pvesr.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 2.
Aug 24 11:00:01 pve01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit pvesr.service has entered the 'failed' state with result 'exit-code'.
Aug 24 11:00:01 pve01 systemd[1]: Failed to start Proxmox VE replication runner.
-- Subject: A start job for unit pvesr.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvesr.service has finished with a failure.
--
-- The job identifier is 6792 and the job result is failed.
Code:
Aug 24 11:15:27 pve01 systemd[1]: ceph-mon@pve01.service: Start request repeated too quickly.
Aug 24 11:15:27 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'resources'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit ceph-mon@pve01.service has entered the 'failed' state with result 'resources'.
Aug 24 11:15:27 pve01 systemd[1]: Failed to start Ceph cluster monitor daemon.
-- Subject: A start job for unit ceph-mon@pve01.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit ceph-mon@pve01.service has finished with a failure.
--
-- The job identifier is 8382 and the job result is failed.