Hello Everyone,
Some weeks ago (20th of December), I built a cluster between 2 nodes in Proxmox within version 7.0-11. Everything had been running smoothly so far until today, in which we had to shutdown the master node (the one I originally created the Cluster in) for maintenance.
After rebooting, the nodes could no longer see each other within the cluster, and kept waiting for quorum. Both of the nodes seem to be running fine, there seems to be no issues within the network at all (they can ping and ssh each other). Rebooting both nodes gave no different behavior whatsoever.
In order to continue testing, we added a third node, which the master of the cluster can see, meanwhile the second node cannot see anything or be seen.
Entries on /etc/hosts are correct according to each node's hostname.
According to logs of corosync, the second node (.11), cannot reach or be reached by anything, event that is not true within the network, as it can be pinged, accessed via ssh, access port 5405 in UDP, and so forth.
Configurations of corosync are the same on every node.
When accessing from second node (01):
When accessing from any of the other two nodes (master and the one we added for testing (04,05)):
Checking number of nodes from master or testing node:
Checking number of nodes from second node:
Finally, we attempted to implement qdevice in order to obtain an additional vote and therefore obtain quorum. Nonetheless, this was not possible as one of the nodes always appeared offline.
May I obtain some guidance regarding this issue? Thank you!
Some weeks ago (20th of December), I built a cluster between 2 nodes in Proxmox within version 7.0-11. Everything had been running smoothly so far until today, in which we had to shutdown the master node (the one I originally created the Cluster in) for maintenance.
After rebooting, the nodes could no longer see each other within the cluster, and kept waiting for quorum. Both of the nodes seem to be running fine, there seems to be no issues within the network at all (they can ping and ssh each other). Rebooting both nodes gave no different behavior whatsoever.
In order to continue testing, we added a third node, which the master of the cluster can see, meanwhile the second node cannot see anything or be seen.
Entries on /etc/hosts are correct according to each node's hostname.
According to logs of corosync, the second node (.11), cannot reach or be reached by anything, event that is not true within the network, as it can be pinged, accessed via ssh, access port 5405 in UDP, and so forth.
Jan 12 15:46:58 <REDACTED>05 corosync[1259]: [KNET ] udp: Received ICMP error from <REDACTED>.5: No route to host <REDACTED>.11
Configurations of corosync are the same on every node.
When accessing from second node (01):
When accessing from any of the other two nodes (master and the one we added for testing (04,05)):
Checking number of nodes from master or testing node:
Code:
root@<REDACTED>05:~# pvecm nodes
Membership information
----------------------
Nodeid Votes Name
1 1 <REDACTED>05 (local)
3 1 <REDACTED>04
root@<REDACTED>05:~#
Checking number of nodes from second node:
Code:
root@<REDACTED>01:~# pvecm nodes
Membership information
----------------------
Nodeid Votes Name
2 1 <REDACTED>01 (local)
root@<REDACTED>01:~#
Finally, we attempted to implement qdevice in order to obtain an additional vote and therefore obtain quorum. Nonetheless, this was not possible as one of the nodes always appeared offline.
May I obtain some guidance regarding this issue? Thank you!
Last edited: