[SOLVED] pve-cluster.service fails after Corosync Update

DavidKahl

Member
Aug 15, 2021
39
5
13
33
Germany
Hi All,

I hope you're all well and healthy.

I've been updating the IPs in the Network and adjusted one node's Corosync file under that nodes:

nano /etc/pve/corosync.conf

I'd stopped both pve-cluster & corosync beforehand and on wanting to restart the Pve-cluster service it gives me the following error:

ob for pve-cluster.service failed because the control process exited with error code.


See "systemctl status pve-cluster.service" and "journalctl -xeu pve-cluster.service" for details.

Checking journalctl -xeu pve-cluster.service gives the following

Defined-By: systemd


░░ Support: https://www.debian.org/support


░░


░░ An ExecStart= process belonging to unit pve-cluster.service has exited.


░░


░░ The process' exit code is 'exited' and its exit status is 255.


Jul 08 11:24:44 Mars systemd[1]: pve-cluster.service: Failed with result 'exit-code'.


░░ Subject: Unit failed


░░ Defined-By: systemd


░░ Support: https://www.debian.org/support


░░


░░ The unit pve-cluster.service has entered the 'failed' state with result 'exit-code'.


Jul 08 11:24:44 Mars systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.


░░ Subject: A start job for unit pve-cluster.service has failed


░░ Defined-By: systemd


░░ Support: https://www.debian.org/support


░░


░░ A start job for unit pve-cluster.service has finished with a failure.


░░


░░ The job identifier is 1399 and the job result is failed.


Jul 08 11:24:44 Mars systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.


░░ Subject: Automatic restarting of a unit has been scheduled


░░ Defined-By: systemd


░░ Support: https://www.debian.org/support


░░


░░ Automatic restarting of the unit pve-cluster.service has been scheduled, as the result for


░░ the configured Restart= setting for the unit.


Jul 08 11:24:44 Mars systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.


░░ Subject: A stop job for unit pve-cluster.service has finished


░░ Defined-By: systemd


░░ Support: https://www.debian.org/support


░░


░░ A stop job for unit pve-cluster.service has finished.


░░


░░ The job identifier is 1497 and the job result is done.


Jul 08 11:24:44 Mars systemd[1]: pve-cluster.service: Start request repeated too quickly.


Jul 08 11:24:44 Mars systemd[1]: pve-cluster.service: Failed with result 'exit-code'.


░░ Subject: Unit failed


░░ Defined-By: systemd


░░ Support: https://www.debian.org/support


░░


░░ The unit pve-cluster.service has entered the 'failed' state with result 'exit-code'.


Jul 08 11:24:44 Mars systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.


░░ Subject: A start job for unit pve-cluster.service has failed


░░ Defined-By: systemd


░░ Support: https://www.debian.org/support


░░


░░ A start job for unit pve-cluster.service has finished with a failure.


░░


░░ The job identifier is 1497 and the job result is failed.

for completion this is the corosync config

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: EarthSpacedock
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.5.101
}
node {
name: Jupiter
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.5.100
}
node {
name: Mars
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.20.100
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: Federation
config_version: 13
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}

Any ideas' how to fix this?

BR

D
 
Last edited:
please post the journal while restarting the service:
* run `journalctl -f` in one shell
* run `systemctl restart corosync` in another shell
* post what gets printed while restarting in code tags

additionally please make sure that /etc/corosync/corosync.conf and /etc/pve/corosync.conf are identical

Check the reference documentation on the topics for further hints that could resolve your issues:
https://pve.proxmox.com/pve-docs/chapter-pvecm.html
(also on how to correctly adapt to new IPs)

I hope this helps!
 
Checked all the Corosync items which I can confirm as identical. This is the error output:

Code:
Jul 18 16:28:29 Mars systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Jul 18 16:28:29 Mars systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jul 18 16:28:29 Mars systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 18 16:28:29 Mars systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 4.
Jul 18 16:28:29 Mars systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 18 16:28:30 Mars systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] notice: resolved node name 'Mars' to '192.168.20.100' for default node IP address
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] notice: resolved node name 'Mars' to '192.168.20.100' for default node IP address
Jul 18 16:28:30 Mars pmxcfs[10392]: [dcdb] crit: local corosync.conf is newer
Jul 18 16:28:30 Mars pmxcfs[10392]: [dcdb] crit: local corosync.conf is newer
Jul 18 16:28:30 Mars pmxcfs[10392]: fuse: mountpoint is not empty
Jul 18 16:28:30 Mars pmxcfs[10392]: fuse: if you are sure this is safe, use the 'nonempty' mount option
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] crit: fuse_mount error: File exists
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] crit: fuse_mount error: File exists
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] notice: exit proxmox configuration filesystem (-1)
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] notice: exit proxmox configuration filesystem (-1)
Jul 18 16:28:30 Mars systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Jul 18 16:28:30 Mars systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jul 18 16:28:30 Mars systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 18 16:28:30 Mars systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Jul 18 16:28:30 Mars systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 18 16:28:30 Mars systemd[1]: pve-cluster.service: Start request repeated too quickly.
Jul 18 16:28:30 Mars systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jul 18 16:28:30 Mars systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 18 16:28:33 Mars pveproxy[10381]: worker exit
Jul 18 16:28:33 Mars pveproxy[1428]: worker 10381 finished
Jul 18 16:28:33 Mars pveproxy[1428]: starting 2 worker(s)
Jul 18 16:28:33 Mars pveproxy[1428]: worker 10393 started
Jul 18 16:28:33 Mars pveproxy[1428]: worker 10394 started
 
Code:
Jul 18 16:28:30 Mars pmxcfs[10392]: fuse: mountpoint is not empty
Jul 18 16:28:30 Mars pmxcfs[10392]: fuse: if you are sure this is safe, use the 'nonempty' mount option
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] crit: fuse_mount error: File exists
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] crit: fuse_mount error: File exists
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] notice: exit proxmox configuration filesystem (-1)
Seems like the mount is refused because the mount point /etc/pve is not empty. Please check what files are there and if there is already something mounted there findmnt /etc/pve.
 
Hi David, I wish you would write how you solved the problem. I have the same problem. It will be useful for me too.
 
Hi,
Hi David, I wish you would write how you solved the problem. I have the same problem. It will be useful for me too.
did you already try following the steps in this thread, see below for a recap? If yes, please give more details about the exact error you get and output of pveversion -v.
Seems like the mount is refused because the mount point /etc/pve is not empty. Please check what files are there and if there is already something mounted there findmnt /etc/pve.
The output of: find /mnt/pve/ was
/etc/pve/
/etc/pve/corosync.conf
Try moving that file somewhere else and restart the pve-cluster.service.
 
Hi All,
the way provided by Fiona led to the solution and immediate resolution. My experience with Corosync is that if the Ping is greater than 10 it regularly drops the Quorum and Fails.

BR

D
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!