[SOLVED] Node not joining Cluster after reboot.

fwinsnes

New Member
Jan 12, 2022
2
0
1
21
Hello Everyone,

Some weeks ago (20th of December), I built a cluster between 2 nodes in Proxmox within version 7.0-11. Everything had been running smoothly so far until today, in which we had to shutdown the master node (the one I originally created the Cluster in) for maintenance.

After rebooting, the nodes could no longer see each other within the cluster, and kept waiting for quorum. Both of the nodes seem to be running fine, there seems to be no issues within the network at all (they can ping and ssh each other). Rebooting both nodes gave no different behavior whatsoever.
In order to continue testing, we added a third node, which the master of the cluster can see, meanwhile the second node cannot see anything or be seen.
Entries on /etc/hosts are correct according to each node's hostname.

According to logs of corosync, the second node (.11), cannot reach or be reached by anything, event that is not true within the network, as it can be pinged, accessed via ssh, access port 5405 in UDP, and so forth.

Jan 12 15:46:58 <REDACTED>05 corosync[1259]: [KNET ] udp: Received ICMP error from <REDACTED>.5: No route to host <REDACTED>.11

Configurations of corosync are the same on every node.

When accessing from second node (01):

1642022152432.png

When accessing from any of the other two nodes (master and the one we added for testing (04,05)):

1642022253643.png

Checking number of nodes from master or testing node:

Code:
root@<REDACTED>05:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 <REDACTED>05 (local)
         3          1 <REDACTED>04
root@<REDACTED>05:~#

Checking number of nodes from second node:

Code:
root@<REDACTED>01:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         2          1 <REDACTED>01 (local)
root@<REDACTED>01:~#

Finally, we attempted to implement qdevice in order to obtain an additional vote and therefore obtain quorum. Nonetheless, this was not possible as one of the nodes always appeared offline.

1642031027774.png

May I obtain some guidance regarding this issue? Thank you!
 
Last edited:
We managed to fix this issue. The second node (01) communicated with the other nodes through a masquerade, therefore presenting itself with a different IP.
Everything works well now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!