We have 19 hosts in the cluster. The Proxmox cluster worked until yesterday. A problem appeared - the cluster (web-ui) became unavailable.
The pvecm nodes or pvecm list command works and waits for a very long time and does not show all servers.
The corosync-quorumtool command does not gain quorum, corosync-cfgtool -n shows ALL 19 "enabled connected" servers.
The /etc/pve file system is unavailable(very long time answer), i cannot enter the directory.
We found out -
8 servers are in DC1
11 servers are in DC2
ping between DCs is 8-9ms.
If you run 11 servers in DC2 - everything works, if we add DC1 servers /etc/pve becomes unavailable after 2-3 nodes.
On web site proxmox write - ping for pmxcfs to work should be no more than 5 ms.
I think we have problems with this.
Questions:
Why have we been working without problems for many months?
Is it possible to change corosync pmxcfs timeouts? (if the problem is due to ping 8-9ms)
Why does /etc/pve become unavailable?
The pvecm nodes or pvecm list command works and waits for a very long time and does not show all servers.
The corosync-quorumtool command does not gain quorum, corosync-cfgtool -n shows ALL 19 "enabled connected" servers.
The /etc/pve file system is unavailable(very long time answer), i cannot enter the directory.
We found out -
8 servers are in DC1
11 servers are in DC2
ping between DCs is 8-9ms.
If you run 11 servers in DC2 - everything works, if we add DC1 servers /etc/pve becomes unavailable after 2-3 nodes.
On web site proxmox write - ping for pmxcfs to work should be no more than 5 ms.
Network Requirements
The Proxmox VE cluster stack requires a reliable network with latencies under 5 milliseconds (LAN performance) between all nodes to operate stably. While on setups with a small node count a network with higher latencies may work, this is not guaranteed and gets rather unlikely with more than three nodes and latencies above around 10 ms.
I think we have problems with this.
Questions:
Why have we been working without problems for many months?
Is it possible to change corosync pmxcfs timeouts? (if the problem is due to ping 8-9ms)
Why does /etc/pve become unavailable?