[SOLVED] pve-cluster.service fails after Corosync Update

DavidKahl · Jul 8, 2023

Hi All,

I hope you're all well and healthy.

I've been updating the IPs in the Network and adjusted one node's Corosync file under that nodes:

nano /etc/pve/corosync.conf

I'd stopped both pve-cluster & corosync beforehand and on wanting to restart the Pve-cluster service it gives me the following error:

ob for pve-cluster.service failed because the control process exited with error code.

See "systemctl status pve-cluster.service" and "journalctl -xeu pve-cluster.service" for details.

Checking journalctl -xeu pve-cluster.service gives the following

Defined-By: systemd

░░ Support: https://www.debian.org/support

░░

░░ An ExecStart= process belonging to unit pve-cluster.service has exited.

░░

░░ The process' exit code is 'exited' and its exit status is 255.

Jul 08 11:24:44 Mars systemd[1]: pve-cluster.service: Failed with result 'exit-code'.

░░ Subject: Unit failed

░░ Defined-By: systemd

░░ Support: https://www.debian.org/support

░░

░░ The unit pve-cluster.service has entered the 'failed' state with result 'exit-code'.

Jul 08 11:24:44 Mars systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.

░░ Subject: A start job for unit pve-cluster.service has failed

░░ Defined-By: systemd

░░ Support: https://www.debian.org/support

░░

░░ A start job for unit pve-cluster.service has finished with a failure.

░░

░░ The job identifier is 1399 and the job result is failed.

Jul 08 11:24:44 Mars systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.

░░ Subject: Automatic restarting of a unit has been scheduled

░░ Defined-By: systemd

░░ Support: https://www.debian.org/support

░░

░░ Automatic restarting of the unit pve-cluster.service has been scheduled, as the result for

░░ the configured Restart= setting for the unit.

Jul 08 11:24:44 Mars systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.

░░ Subject: A stop job for unit pve-cluster.service has finished

░░ Defined-By: systemd

░░ Support: https://www.debian.org/support

░░

░░ A stop job for unit pve-cluster.service has finished.

░░

░░ The job identifier is 1497 and the job result is done.

Jul 08 11:24:44 Mars systemd[1]: pve-cluster.service: Start request repeated too quickly.

Jul 08 11:24:44 Mars systemd[1]: pve-cluster.service: Failed with result 'exit-code'.

░░ Subject: Unit failed

░░ Defined-By: systemd

░░ Support: https://www.debian.org/support

░░

░░ The unit pve-cluster.service has entered the 'failed' state with result 'exit-code'.

Jul 08 11:24:44 Mars systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.

░░ Subject: A start job for unit pve-cluster.service has failed

░░ Defined-By: systemd

░░ Support: https://www.debian.org/support

░░

░░ A start job for unit pve-cluster.service has finished with a failure.

░░

░░ The job identifier is 1497 and the job result is failed.

for completion this is the corosync config

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: EarthSpacedock
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.5.101
}
node {
name: Jupiter
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.5.100
}
node {
name: Mars
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.20.100
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: Federation
config_version: 13
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}

Any ideas' how to fix this?

BR

D

DavidKahl · Jul 8, 2023

Tested that reverting to the backup doesn't solve this.

DavidKahl · Jul 12, 2023

Hi,

any suggestions?

Stoiko Ivanov · Jul 12, 2023

please post the journal while restarting the service:
* run `journalctl -f` in one shell
* run `systemctl restart corosync` in another shell
* post what gets printed while restarting in code tags

additionally please make sure that /etc/corosync/corosync.conf and /etc/pve/corosync.conf are identical

Check the reference documentation on the topics for further hints that could resolve your issues:
https://pve.proxmox.com/pve-docs/chapter-pvecm.html
(also on how to correctly adapt to new IPs)

I hope this helps!

DavidKahl · Jul 18, 2023

Checked all the Corosync items which I can confirm as identical. This is the error output:

Code:

Jul 18 16:28:29 Mars systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Jul 18 16:28:29 Mars systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jul 18 16:28:29 Mars systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 18 16:28:29 Mars systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 4.
Jul 18 16:28:29 Mars systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 18 16:28:30 Mars systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] notice: resolved node name 'Mars' to '192.168.20.100' for default node IP address
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] notice: resolved node name 'Mars' to '192.168.20.100' for default node IP address
Jul 18 16:28:30 Mars pmxcfs[10392]: [dcdb] crit: local corosync.conf is newer
Jul 18 16:28:30 Mars pmxcfs[10392]: [dcdb] crit: local corosync.conf is newer
Jul 18 16:28:30 Mars pmxcfs[10392]: fuse: mountpoint is not empty
Jul 18 16:28:30 Mars pmxcfs[10392]: fuse: if you are sure this is safe, use the 'nonempty' mount option
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] crit: fuse_mount error: File exists
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] crit: fuse_mount error: File exists
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] notice: exit proxmox configuration filesystem (-1)
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] notice: exit proxmox configuration filesystem (-1)
Jul 18 16:28:30 Mars systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Jul 18 16:28:30 Mars systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jul 18 16:28:30 Mars systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 18 16:28:30 Mars systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Jul 18 16:28:30 Mars systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 18 16:28:30 Mars systemd[1]: pve-cluster.service: Start request repeated too quickly.
Jul 18 16:28:30 Mars systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jul 18 16:28:30 Mars systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Jul 18 16:28:33 Mars pveproxy[10381]: worker exit
Jul 18 16:28:33 Mars pveproxy[1428]: worker 10381 finished
Jul 18 16:28:33 Mars pveproxy[1428]: starting 2 worker(s)
Jul 18 16:28:33 Mars pveproxy[1428]: worker 10393 started
Jul 18 16:28:33 Mars pveproxy[1428]: worker 10394 started

fiona · Jul 19, 2023

DavidKahl said:

Code:

Jul 18 16:28:30 Mars pmxcfs[10392]: fuse: mountpoint is not empty
Jul 18 16:28:30 Mars pmxcfs[10392]: fuse: if you are sure this is safe, use the 'nonempty' mount option
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] crit: fuse_mount error: File exists
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] crit: fuse_mount error: File exists
Jul 18 16:28:30 Mars pmxcfs[10392]: [main] notice: exit proxmox configuration filesystem (-1)

Seems like the mount is refused because the mount point /etc/pve is not empty. Please check what files are there and if there is already something mounted there findmnt /etc/pve.

DavidKahl · Jul 19, 2023

Hi Fiona,

The output of: find /mnt/pve/ was

/etc/pve/
/etc/pve/corosync.conf

the findmnt /etc/pve didnt work.

BR

D

fiona · Jul 19, 2023

DavidKahl said:
Hi Fiona,

The output of: find /mnt/pve/ was

Do you mean /etc/pve?

DavidKahl said:
/etc/pve/
/etc/pve/corosync.conf

So it is not empty as expected from the error message. Try moving that file somewhere else and restart the pve-cluster.service.

DavidKahl said:
the findmnt /etc/pve didnt work.

No output means nothing is mounted there. Or do you mean there was an error?

DavidKahl · Jul 20, 2023

Thanks, solved it

kesen · Dec 6, 2023

Hi David, I wish you would write how you solved the problem. I have the same problem. It will be useful for me too.

fiona · Dec 6, 2023

Hi,

kesen said:
Hi David, I wish you would write how you solved the problem. I have the same problem. It will be useful for me too.

did you already try following the steps in this thread, see below for a recap? If yes, please give more details about the exact error you get and output of pveversion -v.

fiona said:
Seems like the mount is refused because the mount point /etc/pve is not empty. Please check what files are there and if there is already something mounted there findmnt /etc/pve.

DavidKahl said:
The output of: find /mnt/pve/ was
/etc/pve/
/etc/pve/corosync.conf

fiona said:
Try moving that file somewhere else and restart the pve-cluster.service.

DavidKahl · Dec 6, 2023

Hi All,
the way provided by Fiona led to the solution and immediate resolution. My experience with Corosync is that if the Ping is greater than 10 it regularly drops the Quorum and Fails.

BR

D

Search

Search

[SOLVED] pve-cluster.service fails after Corosync Update

DavidKahl

Member

DavidKahl

Member

DavidKahl

Member

Stoiko Ivanov

Proxmox Staff Member

DavidKahl

Member

fiona

Proxmox Staff Member

DavidKahl

Member

fiona

Proxmox Staff Member

DavidKahl

Member

kesen

New Member

fiona

Proxmox Staff Member

DavidKahl

Member

We value your privacy