[SOLVED] Fixing a broken cluster node

mav

Member
Nov 3, 2022
8
1
8
Regarding the large cluster I spoke about in my previous thread.

I spent a solid day on site reinstalling every system and replumbing everything, and got it all running again along with separate individual networks for three clusters of 8-10 nodes each.

Yesterday, I started setting up the first cluster. The first two nodes clustered without issue. When I came to node 3, I realized too late I'd made an error on my network configuration and the third node could not communicate with the first two. I tried to fix this by manually taking everything offline and fixing corosync.conf, then bringing it all back up, but could never get the nodes to communicate regardless of what I did.

I attempted to remove the third node and rejoin it via the process documented here; I was able to simply use pvecm delnode to remove the system from nodes 1 and 2, and I removed /etc/pve/nodes/node3 from the errant node along with the other files specified in the documentation and after a restart node 3 came back up and worked OK as a standalone node.

When I tried to add node 3 back to the cluster, though, it does not work correctly, and the web UI throws a certificate error - Connection error 596 -tls_process_server_certificate: certificate verify failed. The logs make this pretty obvious why:

Code:
Nov 22 11:09:09 pve01 pveproxy[759416]: '/etc/pve/nodes/pve03/pve-ssl.pem' does not exist!

sure enough /etc/pve/nodes/pve03 on the working two nodes does not include a certificate pair and searching for pve-ssl.pem/key on node 3 does not return any files. Nor is there an /etc/pve/nodes dir on the third node at all.

I'd very much like to be able to repair this without having to wipe and reinstall all these nodes again. If anyone has any suggestions I'd love to hear them.
 
Last edited:
This is solved.
Do you ever wonder if sometimes you're just cursed?
I looked at the network configuration on these systems probably a dozen times, and it was STILL WRONG.
Once I actually got the right IP address configured on node 3, I was able to follow the process in the documentation and start from scratch, and then join node 3 successfully.
I'm going to go take a walk now :(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!