Hi!
i am running a cluster with multiple PVE nodes (all on Ver 7.4) but i am currently facing an issue with one of them. So now i would like to ask you for any hints, maybe i just simply overlooked something silly.
I just recently joined two more nodes into the cluster, which worked without any problems. But for some reason that i cannot figure out i keep getting timeout messages as soon as i try to access one of the two new nodes over the web gui. (For example checking the summary or the log files). The other new node works without any problems.
I tried a few things here and there and noticed the following:
- The cluster node that got the problem (let's call it "node 6" because it is the 6th out of the now total 7 servers) is accessable via web gui without any issues, if i log into it directly, but i cannot view any of the other clusternodes, running into a timeout. All nodes are shown as online and healthy.
- When checking the syslog of node 6 i sometimes see an error from pveproxy: "proxy detected vanished client connection" but i am not able to figure out if that is cause or symptom or how to solve this. Restarting pveproxy and even restarting the whole node did not change a thing. I cannot see any errors regarding corosync or other pve services.
- Any other node in the cluster, for example "node 1" (the one i used to join node 6 into the cluster) or "node 7" (the second server i added with the troublesome one) can access any other server through web gui without running into timeouts except node 6.
- i can ping any clusternode from any server without problem through either IP or hostname. The /etc/hosts-Files of all servers is correct and identical
- the cluster itself is healthy and does not show me any errors at all (via "pvecm status" or looking into the logfiles.) The cluster join also worked without any issues for both new servers.
- all servers run on pve 7.4.16, but it might be the case that one or two nodes do not run currently on the exact same kernel version. Both node 6 and 7 that i joined into the cluster run on the same up to date kernel but only the 6 is having issues, not the other one.
I also have the datacenter-wide firewall active with some rules that mainly restirct ssh access from the outside but i assume this should not interfer with the cluster communication itself? If it does, are there any specific rules that are required for the communication to work?
If anyone could maybe give me a hint what i might be missing that would be really helpful! I think i'm overlooking something simple here but i just cannot figure out what.
Thank you in advance!
i am running a cluster with multiple PVE nodes (all on Ver 7.4) but i am currently facing an issue with one of them. So now i would like to ask you for any hints, maybe i just simply overlooked something silly.
I just recently joined two more nodes into the cluster, which worked without any problems. But for some reason that i cannot figure out i keep getting timeout messages as soon as i try to access one of the two new nodes over the web gui. (For example checking the summary or the log files). The other new node works without any problems.
I tried a few things here and there and noticed the following:
- The cluster node that got the problem (let's call it "node 6" because it is the 6th out of the now total 7 servers) is accessable via web gui without any issues, if i log into it directly, but i cannot view any of the other clusternodes, running into a timeout. All nodes are shown as online and healthy.
- When checking the syslog of node 6 i sometimes see an error from pveproxy: "proxy detected vanished client connection" but i am not able to figure out if that is cause or symptom or how to solve this. Restarting pveproxy and even restarting the whole node did not change a thing. I cannot see any errors regarding corosync or other pve services.
- Any other node in the cluster, for example "node 1" (the one i used to join node 6 into the cluster) or "node 7" (the second server i added with the troublesome one) can access any other server through web gui without running into timeouts except node 6.
- i can ping any clusternode from any server without problem through either IP or hostname. The /etc/hosts-Files of all servers is correct and identical
- the cluster itself is healthy and does not show me any errors at all (via "pvecm status" or looking into the logfiles.) The cluster join also worked without any issues for both new servers.
- all servers run on pve 7.4.16, but it might be the case that one or two nodes do not run currently on the exact same kernel version. Both node 6 and 7 that i joined into the cluster run on the same up to date kernel but only the 6 is having issues, not the other one.
I also have the datacenter-wide firewall active with some rules that mainly restirct ssh access from the outside but i assume this should not interfer with the cluster communication itself? If it does, are there any specific rules that are required for the communication to work?
If anyone could maybe give me a hint what i might be missing that would be really helpful! I think i'm overlooking something simple here but i just cannot figure out what.
Thank you in advance!