pve-cluster won't start

NWTech

New Member
Jun 28, 2024
6
0
1
I have a node that was previously in a cluster that I had to remove from the cluster but the removal left some configurations still in place. I was working on cleaning up after the removal running all sorts of commands and among them was a shut down for the pve-cluster. I must have done something wrong after that because now when I try to start it back up again it errors out with this:

Job for pve-cluster.service failed because the control process exited with error code.
See "systemctl status pve-cluster.service" and "journalctl -xeu pve-cluster.service" for details.

In the logs I find this:

Oct 30 09:12:44 <MACHINENAME> systemd[1]: pve-cluster.service: Start request repeated too quickly.
Oct 30 09:12:44 <MACHINENAME> systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Oct 30 09:12:44 <MACHINENAME> systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.

Needless to say my webUI is not working so I'm connected with ssh remotely. Kinda lost at this point and would be very appreciative of anyone who has something I could try to do to fix this.

Not sure if these would help but heres some other errors I get in the logs when I try to restart the pve-cluster service:

[dcdb] crit: unable to parse cluster config_version
[main] crit: fuse_mount error: File exists
 
Last edited:
I have tried:
1. Unmounting from the /etc/pve directory and remounting
2. Uninstalling and reinstalling but I don't think its actually going through because it ends with these errors:
E: Sub-process /usr/share/proxmox-ve/pve-apt-hook returned an error code (1)
E: Failure running script /usr/share/proxmox-ve/pve-apt-hook
3. obviously a reboot
 
Never mind everyone, I fixed it one way or another. Not sure which change it was that fixed it but here's some of the things I did for anyone in the future with this problem:
1. upgrades were not working properly so I did an apt clean which seemed to fix the upgrade problems
2. I ran into lots of issues with pmxcfs and I think the root issue ended up being the /etc/pve directory wasn't empty so when I cleaned that completely I got rid of those problems
3. within the /etc/pve/corosync.conf file I changed bindnetaddr value from 192.168.0.0 to my server's IP

I think it was after all these changes that I rebooted and it came back up in the webUI. Obviously I had more than just one problem . I'm not totally sure this is everything I did so if anyone figures out something different that I forgot about or knows which one of those was the problem solver that'd be awesome. Thank you.
 
Final fix:
After all this I ran into this same problem just a few minutes after. This time I pin pointed the permanent fix (in my situation at least). This one simple command is the fix:
Code:
rm -r /etc/pve/nodes/*
It just clears the configuration for that node which was my problem; I had left over cluster configurations from when I attached and detached it from the cluster. While this is what finally fixed it for me I have a suspicion that I would've had to run at least some of those other commands to make sure that all the cluster configuration was destroyed.
Maybe its just me as a noob but I feel like a cleaner easier way to detach nodes from clusters would be a helpful feature in the future.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!