[SOLVED] Partial new node join to existing cluster

hmcarthur

New Member
Jan 11, 2023
3
0
1
Hi Everyone,

I've had cluster of 3 nodes and I recently added a 4th that is on a separate subnet. Unfortunately, I forgot a firewall rule for corosync and RPCBind so joining node4 to the cluster has only partially completed it seems. I have subsequently fixed the firewall issue and made sure all necessary ports are allowed, however, the 4th node isn't working properly.

I can see node4 (PVE4) on the list of nodes and it has a green tick. (screenshot1)
I can see all the notes in cluster section have green ticks and my cluster apparently has quorum. (Quorate: Yes) (screenshot2)
Unfortunately, when I click on node 4, while the historical data graph is there, none of the live status information is displayed. (screenshot3)
I also cannot access the web interface on node 4 either.

If I click on the Cluster section in Datacentre I get an error about a certificate not existing.
screenshot4.png

Any help or guidance is most appreciated. I've tried simply restarting node 4 hoping it would recover automatically, with no success.

I'm wondering if I can just copy the missing certificates from node 4 to the location above? or if there is a bigger problem at hand?

Hopefully I'm not doomed to have to remove node 4 and reinstall Proxmox again from scratch and rejoin.

Any help or guidance would be much appreciated as I don't want to fiddle and try things, not understanding the bigger picture, and then break my whole cluser.

Thanks in advance
 

Attachments

  • screenshot1.png
    screenshot1.png
    5.9 KB · Views: 1
  • screenshot2.png
    screenshot2.png
    16 KB · Views: 0
  • screenshot3.png
    screenshot3.png
    52.3 KB · Views: 1
I think I managed to figure out the solution and I'm hoping this will help someone else that may find themself in the same situation as getting things back on track doesn't seem difficult.

Just noting that we don't have anything like HA or Ceph. Our cluster is basic and for management convenience. I also didn't have any VMS running on the new node yet.

1. If your new node is in a different subnet to the existing ones triple check your firewall configuration. This was my underlying issue, not all the ports were working both ways. From my understanding you need the below at a minimum:

Web interface: 8006 (TCP, HTTP/1.1 over TLS)
sshd (used for cluster actions): 22 (TCP)
rpcbind: 111 (UDP)
corosync cluster traffic: 5405-5412 UDP
live migration (VM memory and local-disk data): 60000-60050 (TCP)

2. Check that the nodes can access each other on the above ports.

3. Ran pvecm updatecerts --force

I'm currently running a migration to test the operation. Fingers crossed.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!