[SOLVED] Timeout on Web GUI with clusternode

AraToken

Member
Jul 8, 2021
41
5
13
28
Hi!
i am running a cluster with multiple PVE nodes (all on Ver 7.4) but i am currently facing an issue with one of them. So now i would like to ask you for any hints, maybe i just simply overlooked something silly.

I just recently joined two more nodes into the cluster, which worked without any problems. But for some reason that i cannot figure out i keep getting timeout messages as soon as i try to access one of the two new nodes over the web gui. (For example checking the summary or the log files). The other new node works without any problems.

I tried a few things here and there and noticed the following:
- The cluster node that got the problem (let's call it "node 6" because it is the 6th out of the now total 7 servers) is accessable via web gui without any issues, if i log into it directly, but i cannot view any of the other clusternodes, running into a timeout. All nodes are shown as online and healthy.
- When checking the syslog of node 6 i sometimes see an error from pveproxy: "proxy detected vanished client connection" but i am not able to figure out if that is cause or symptom or how to solve this. Restarting pveproxy and even restarting the whole node did not change a thing. I cannot see any errors regarding corosync or other pve services.
- Any other node in the cluster, for example "node 1" (the one i used to join node 6 into the cluster) or "node 7" (the second server i added with the troublesome one) can access any other server through web gui without running into timeouts except node 6.
- i can ping any clusternode from any server without problem through either IP or hostname. The /etc/hosts-Files of all servers is correct and identical
- the cluster itself is healthy and does not show me any errors at all (via "pvecm status" or looking into the logfiles.) The cluster join also worked without any issues for both new servers.
- all servers run on pve 7.4.16, but it might be the case that one or two nodes do not run currently on the exact same kernel version. Both node 6 and 7 that i joined into the cluster run on the same up to date kernel but only the 6 is having issues, not the other one.

I also have the datacenter-wide firewall active with some rules that mainly restirct ssh access from the outside but i assume this should not interfer with the cluster communication itself? If it does, are there any specific rules that are required for the communication to work?

If anyone could maybe give me a hint what i might be missing that would be really helpful! I think i'm overlooking something simple here but i just cannot figure out what.

Thank you in advance!
 
Hi!
i am running a cluster with multiple PVE nodes (all on Ver 7.4) but i am currently facing an issue with one of them. So now i would like to ask you for any hints, maybe i just simply overlooked something silly.

I just recently joined two more nodes into the cluster, which worked without any problems. But for some reason that i cannot figure out i keep getting timeout messages as soon as i try to access one of the two new nodes over the web gui. (For example checking the summary or the log files). The other new node works without any problems.

I tried a few things here and there and noticed the following:
- The cluster node that got the problem (let's call it "node 6" because it is the 6th out of the now total 7 servers) is accessable via web gui without any issues, if i log into it directly, but i cannot view any of the other clusternodes, running into a timeout. All nodes are shown as online and healthy.
- When checking the syslog of node 6 i sometimes see an error from pveproxy: "proxy detected vanished client connection" but i am not able to figure out if that is cause or symptom or how to solve this. Restarting pveproxy and even restarting the whole node did not change a thing. I cannot see any errors regarding corosync or other pve services.
- Any other node in the cluster, for example "node 1" (the one i used to join node 6 into the cluster) or "node 7" (the second server i added with the troublesome one) can access any other server through web gui without running into timeouts except node 6.
- i can ping any clusternode from any server without problem through either IP or hostname. The /etc/hosts-Files of all servers is correct and identical
- the cluster itself is healthy and does not show me any errors at all (via "pvecm status" or looking into the logfiles.) The cluster join also worked without any issues for both new servers.
- all servers run on pve 7.4.16, but it might be the case that one or two nodes do not run currently on the exact same kernel version. Both node 6 and 7 that i joined into the cluster run on the same up to date kernel but only the 6 is having issues, not the other one.

I also have the datacenter-wide firewall active with some rules that mainly restirct ssh access from the outside but i assume this should not interfer with the cluster communication itself? If it does, are there any specific rules that are required for the communication to work?

If anyone could maybe give me a hint what i might be missing that would be really helpful! I think i'm overlooking something simple here but i just cannot figure out what.

Thank you in advance!
Hi,
please share the journal from around the time you try to connect from one host to the other journalctl --since <DATETIME> --until <DATETIME> > journal.txt. ssh is required for inter-cluster communication, so test if the issue still persists if you deactivate the firewall rule you mentioned. By default, the PVE firewall will set exceptions for the cluster communication channels needed, but it might be that your rule overrules these.

Edit: You could also enable logging for these rules, in order to identify traffic and set them to reject instead of drop to get a response for the client trying to connect.
 
Last edited:
Hi Chris!
thank you so much for the hint with the firewall! I created a new allow rule and now everything works as it should.
So it was as i thought initially something simple in the end.

Thank you and best regards!
 
  • Like
Reactions: Chris

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!