Proxmox cluster health

3rKaN_BRATTE

New Member
Mar 5, 2025
2
0
1
Hi everyone, I'm pretty new to Proxmox, and so far, I really like it. Today, I attempted to connect my two OVH servers to run as a cluster over the private NIC. Everything seemed to work fine until I started experiencing timeouts on the second server through the UI. After some troubleshooting, I realized I had forgotten to open the Corosync ports in the firewall. Once I quickly deactivated the firewall, everything started working fine. My question is, could this have caused any issues with the cluster, considering the ports weren't open when joining, or am I just being paranoid?

Code:
root@core:~# pvecm status
Cluster information
-------------------
Name:             host-clu
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Mar 14 01:17:21 2025
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.12
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.100.10 (local)
0x00000002          1 192.168.100.11
 
If it doesn’t have a quorum (2) the node(s) would have likely booted, which was probably your timeouts. I don’t think it would cause a problem, though.

 
If it doesn’t have a quorum (2) the node(s) would have likely booted, which was probably your timeouts.
I don’t fully understand what you mean by that. The GUI indicated that my second server in the cluster was reachable since port 22 wasn’t blocked by the firewall. However, the Corosync ports were blocked, which prevented me from receiving all the necessary data, resulting in timeouts in the GUI when trying to display certain information.
 
If you have 2 nodes, and one node cannot see the other, it will restart to try to fix the problem. I would guess, that is the reason for the timeouts. If a cluster does not have over 50% of nodes online (1 of 2 is 50%, not over 50%) then it does not have a valid quorum and neither node knows whether it has valid data. If 2 out of 3 nodes are online then they would remain online and the third would not.
 
  • Like
Reactions: Johannes S and UdoB