Problem with GUI http connection timeout only way in 2 node cluster

ABaum

Active Member
Nov 2, 2018
22
1
43
45
Hello I need assistance from someone,
I have node 1 that can log into and see the GUI
Node 2 I can log in and see the GUI

in Node 1 GUI page I can do everything I expect.
in Node 2 GUI page I can do all tasks to node 2
But and here is the problem I cannot see or work with node1 only getting a timeout.
pveproxy sys log has "proxy detected vanished client connection" on node2 after trying to view node1 pages.

Both units on 8.2.2 installed recently.
Where should I look for the one thing that is different between the two?

i also cannot ssh across from node2 to node1 however I can ssh node1 to node2

Code:
root@node2:~# ssh -vvv root@192.168.2.3
OpenSSH_9.2p1 Debian-2+deb12u3, OpenSSL 3.0.13 30 Jan 2024
debug1: Reading configuration data /root/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug2: resolve_canonicalize: hostname 192.168.2.3 is address
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/root/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/root/.ssh/known_hosts2'
debug3: ssh_connect_direct: entering
debug1: Connecting to 192.168.2.3 [192.168.2.3] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x10
debug1: connect to address 192.168.2.3 port 22: Connection timed out
ssh: connect to host 192.168.2.3 port 22: Connection timed out
 
I have no problems with corosync
I have been carefully checking for firewall rules that block it.
But still get timeout on ssh.
 
I have no problems with corosync
I am not sure this helps. May be your corosync is on a different network? May be the packets are small enough to fit into broken MTU? You did not provide any details about your network setup or system state. Just reported a user level application error.
But still get timeout on ssh.
PVE is based on Debian with Ubuntu Kernel. SSH is basic part of Linux Userland. Start checking ports with "nc" , enable Debug on SSHD side, add more verbosity to "ssh" client side. Get some network captures.

The MTU and/or Duplicate IP are the most likely culprits based on the limited amount information you provided. But I could be completely wrong, its just a guess.

IMHO, if you cant reliably ssh between the nodes, there is no point in troubleshooting anything above it.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
This setup is clustered across wire-guard vpn no high availability, and was working.
When I upgraded the last node I removed it from cluster then reinstalled and rejoined. that is when the trouble started.

I have wg like this and will play with MTU for now.

Code:
root@pve24:~# wg
interface: wg0
  public key: PdxhRSbIVaqfGMzQz+mmhCOti+Le4kveZ1geE7yDUWw=
  private key: (hidden)
  listening port: 51826

peer: zr43aPHwf5HSML+wHjiuMMbdR935gkSoP3twPabiyXE=
  endpoint: 129.222.136.67:52175
  allowed ips: 192.168.2.4/32
  latest handshake: 16 seconds ago
  transfer: 1.19 GiB received, 32.07 GiB sent
  persistent keepalive: every 25 seconds

peer: 8WIcHZfCGbKvDS5dkxzZdleApW1i52se6NwzKPRmDx8=
  endpoint: 69.41.195.50:51824
  allowed ips: 192.168.2.3/32
  latest handshake: 1 minute, 33 seconds ago
  transfer: 1.65 GiB received, 2.33 GiB sent
  persistent keepalive: every 25 seconds

Code:
root@pve21:~# wg
interface: wg0
  public key: 8WIcHZfCGbKvDS5dkxzZdleApW1i52se6NwzKPRmDx8=
  private key: (hidden)
  listening port: 51824

peer: PdxhRSbIVaqfGMzQz+mmhCOti+Le4kveZ1geE7yDUWw=
  endpoint: 216.110.250.179:51826
  allowed ips: 192.168.2.5/32
  latest handshake: 6 seconds ago
  transfer: 7.51 GiB received, 5.07 GiB sent
  persistent keepalive: every 25 seconds

peer: zr43aPHwf5HSML+wHjiuMMbdR935gkSoP3twPabiyXE=
  endpoint: 129.222.136.67:52175
  allowed ips: 192.168.2.4/32
  latest handshake: 51 seconds ago
  transfer: 39.17 GiB received, 14.08 GiB sent
  persistent keepalive: every 25 seconds
 
This setup is clustered across wire-guard vpn no high availability, and was working.
When I upgraded the last node I removed it from cluster then reinstalled and rejoined. that is when the trouble started.
The complexity of the situation just increased 20x.

I can only recommend to start from the basics. SSH is critical part of the PVE innerworkings, getting it to work reliably is the first step.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
The complexity of the situation just increased 20x.

I can only recommend to start from the basics. SSH is critical part of the PVE innerworkings, getting it to work reliably is the first step.

Understood.
Any basic things for me to check?
I just got done proving that ssh works from WAN to my homedesk also on WAN After I fixed the ip allow list to include my current one.
So then its the wg that is the problem for now.
 
Ok this nmap finally shows something different. (filtered)

The ssh connections that work are reporting open with this command. and the ones that don't are all pointing to the one server I added last to the cluster.
I now will need to see if I can find the differences in the setup to cause the ssh port to be closed.

Code:
@pve25:~# nmap 192.168.2.3 -PN -p ssh
Starting Nmap 7.93 ( https://nmap.org ) at 2024-07-17 17:17 EDT
Nmap scan report for 192.168.2.3
Host is up.

PORT   STATE    SERVICE
22/tcp filtered ssh

Nmap done: 1 IP address (1 host up) scanned in 2.12 seconds
 
Last edited:
Well It is working now.
I am not certain what I exactly did to make it work.
The last step that I did before ssh suddenly connected was restart pve-firewall
I fixed up the hosts file earlier because it did not have the name resolve for the wg0 connection IPs.
then ran the commands in cant-connect-to-destination-address-using-public-key-task-error-migration-aborted

I don't recall if I did restart the firewall after doing that before I did it again because I was trying to enable logging to show where the packets drop.
I did use tcpdump to see the ssh arriving at the wg interface.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!