Got timeout (500) - Communication failure (0) - Connection timed out (595)

maverickws · Jun 17, 2020

Hi all,

Two nodes connected by private LAN.
3 networks and 3 Corosync links (link0,link1,link2) with independent networks on private vSwitch.

When accessing a remote node through the GUI of one:

Code:

# ping -c100 -f 10.11.49.2
PING 10.11.49.2 (10.11.49.2) 56(84) bytes of data.

--- 10.11.49.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 258ms
rtt min/avg/max/mdev = 2.525/2.568/2.605/0.034 ms, ipg/ewma 2.606/2.574 ms

Do you think this link is having communications failure or connectivity issues? Or timeouts?

Code:

# ping -c1000 -f 10.11.49.2
PING 10.11.49.2 (10.11.49.2) 56(84) bytes of data.

--- 10.11.49.2 ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 609ms
rtt min/avg/max/mdev = 2.537/2.571/2.613/0.057 ms, ipg/ewma 2.609/2.574 ms

Only Proxmox is. Why?

aaron · Jun 17, 2020

Can you please post the /etc/network/interfaces and the /etc/pve/corosync.conf files? If you use public IPs redact them but keep them uniquely identifiable.

maverickws · Jun 17, 2020

Hi @aaron thanks for the reply.

Here goes as requested:
/etc/network/interfaces

Code:

auto lo
iface lo inet loopback

iface lo inet6 loopback

auto enp2s0
iface enp2s0 inet manual

auto enp2s1
iface enp2s1 inet manual

auto enp2s2
iface enp2s2 inet manual

auto enp2s1.4001
iface enp2s1.4001 inet manual
    mtu 1400
#Public

auto enp2s1.4005
iface enp2s1.4005 inet manual
    mtu 1400
#Private

auto enp2s2.4010
iface enp2s2.4010 inet manual
    mtu 1400
#Data

auto enp2s2.4011
iface enp2s2.4011 inet manual
    mtu 1400
#ClusterSync

auto enp2s2.4013
iface enp2s2.4013 inet manual
    mtu 1400
#Management

auto vmbr0
iface vmbr0 inet static
    address public_ipv4/27
    gateway ipv4_gw
    bridge-ports enp2s0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

iface vmbr0 inet6 static
    address ipv6_public::2/64
    gateway gw_v6

    up route add -net public_v4 netmask slash_27_netmask gw ipv4_gw dev vmbr0
    up route add -net public_v4 netmask slash_29_netmask gw ipv4_gw dev vmbr0
    up route -6 add ipv6_public::/64 dev vmbr0

auto vmbr1
iface vmbr1 inet manual
    bridge-ports none
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

auto vmbr01
iface vmbr01 inet manual
    address 10.1.49.1/24
    gateway 10.1.49.254
    bridge-ports enp2s1.4001
    bridge-stp off
    bridge-fd 0
#Public

auto vmbr05
iface vmbr05 inet manual
    address 10.5.49.1/24
    gateway 10.5.49.254
    bridge-ports enp2s1.4005
    bridge-stp off
    bridge-fd 0
#Private

auto vmbr10
iface vmbr10 inet manual
    address 10.10.49.1/24
    gateway 10.10.49.254
    bridge-ports enp2s2.4010
    bridge-stp off
    bridge-fd 0
#Data

auto vmbr11
iface vmbr11 inet manual
    address 10.11.49.1/24
    gateway 10.10.49.254
    bridge-ports enp2s2.4011
    bridge-stp off
    bridge-fd 0
#ClusterSync

auto vmbr13
iface vmbr13 inet manual
    address 10.13.49.1/24
    gateway 10.13.49.254
    bridge-ports enp2s2.4013
    bridge-stp off
    bridge-fd 0
#Management

/etc/pve/corosync.conf

Code:

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: proxmox-01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.11.49.1
    ring1_addr: 10.5.49.1
    ring2_addr: 10.1.49.1
  }
  node {
    name: proxmox-02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.11.49.2
    ring1_addr: 10.5.49.2
    ring2_addr: 10.1.49.2
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: MyCluster
  config_version: 2
  interface {
    linknumber: 0
  }
  interface {
    linknumber: 1
  }
  interface {
    linknumber: 2
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

maverickws · Jun 17, 2020

I've since set the mtu to 1400 on all interfaces but WAN
Issue persists

aaron · Jun 18, 2020

Can the nodes reach each other on the public address?

My guess is that this is not possible and that in the /etc/hosts the public IP is set for the node. The proxy traffic to other nodes is usually sent to the IP configured there.

If my guess is correct, a possible workaround would be to change the IP set in /etc/hosts for the node to one of the private ones that you want to use for the proxy traffic.
IMPORTANT: if you use the firewall and access the nodes from external you will have to manually set rules to allow access via SSH and to the GUI on the external IP because the automatic rules are generated for the IP set in /etc/hosts IIRC.

The following hint from the documentation helps to simplify that:

To simplify that task, you can instead create an IPSet called “management”, and add all remote IPs there. This creates all required firewall rules to access the GUI from remote.

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_configuration_files

maverickws · Jun 18, 2020

Oh shit ok.
I get it is the same stuff @t.lamprecht described on my other post. I'm having some IPv6 connectivity issues, so I'll assume for now that is causing the issue. So changing link0, link1 and link2 resolved nothing.

Honestly I think this implementation is wrong. The proxmox cluster should have a feature to select which links to use for the nodes to communicate via API among each other across the cluster, and shouldn't have to be neither a public IP nor whatever's showing on /etc/hosts.

When I created the cluster I chose three networks for proxmox to use. Using another link aside from those the administrator selected is wrong imo.

Thanks for your replies.

fabian · Jun 18, 2020

using the corosync links for proxying API traffic (and migration/storage replication traffic) by default would be wrong and dangerous - that could cause an overload of the corosync links which might lead to a total cluster failure.

making it easier to specify which network to use for proxying API traffic probably makes sense, the current way of "resolve the other nodes hostname" works quite well in practice though. you can already override the network used for migration traffic, and external access uses whatever the client provides anyway..

maverickws · Jun 18, 2020

@fabian

I never meant that nor what I said should be interpreted that way. If that's what you read, the wrong is on you. Corosync traffic should have its dedicated network? YES! same page here. When I said "select which links to use for the nodes to communicate via API among each other" I meant select a link purely for that purpose - API traffic.

Making it easier to specify which network to use for proxying API traffic would make sense yes.
I get it such feature doesn't exist now, but I do hope you implement it in the future, as it would be very very good to have that control, and I imagine more people could benefit from such.

Thanks for the feedback.

Search

Search

Got timeout (500) - Communication failure (0) - Connection timed out (595)

maverickws

Member

aaron

Proxmox Staff Member

maverickws

Member

maverickws

Member

aaron

Proxmox Staff Member

maverickws

Member

fabian

Proxmox Staff Member

maverickws

Member