[SOLVED] Proxmox Hosts does not see each other in the network

Feb 27, 2020
48
3
28
51
Hi
I have just installed 2 proxmox nodes, applied the license and configure the network as nested bonding in my production environment. The network is up and from any of the two nodes i can ping internet and ping other hosts in the network. I can also ping any of the proxmox nodes from other hosts in the network.

However, the two proxmox nodes can not ping each other, and obviously i can not setup a cluster :-(.

I have attached the nested bonding config, although the fact that i can send/receive pings to/from other hosts in the network (L3) and i can set bonding1 or 2 down and i it works as expected(L2) make me thinks there is nothing wrong with such config but with something i am missing completely.

Any ideas? I can not move forward deploying VMs into production without setting up a cluster.

(network is a /24 and both nodes have the same configuration, hardware and software, with the exception of the ip address) . No firewall is active or setup (iptables -L comes empty)

auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
iface eno3 inet manual
iface eno4 inet manual
iface enp7s0f0 inet manual
iface enp7s0f1 inet manual
iface enp8s0f0 inet manual
iface enp8s0f1 inet manual


auto bond1
iface bond1 inet manual
slaves eno1 eno2
bond_mode 802.3ad
bond_miimon 100
bond-xmit-hash-policy layer2+3
bond-lacp-rate 1

auto bond2
iface bond2 inet manual
slaves enp7s0f0 enp7s0f1
bond_mode 802.3ad
bond_miimon 100
bond-xmit-hash-policy layer2+3
bond-lacp-rate 1

auto bond0
iface bond0 inet manual
slaves bond1 bond2
bond_mode active-backup
bond_miimon 200

auto vmbr0
iface vmbr0 inet static
address 10.12.64.41
netmask 255.255.255.0
gateway 10.12.64.2
bridge_ports bond0
bridge_stp off
bridge_fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
 
are you sure that the 2 hosts, have their active card (from the active-backup bond), on the same switch ? it's not necessary the order defined in config.
you can verify with cat /proc/net/bonding/bond0

(you can use "bond-primary bond1" option to be sure.)
 
Both switches are directly connected (not stacked) so it should not make a difference.
To be clear, the topology is as follows

node1-bond1-switch1 (bond1 = lacp between eno1, eno2)
node1-bond2-switch2 (bond2 = lacp between enp7s0f0, enp7s0f1)
bond0 is HA bond1/bond2

Cisco Switch detects lacp properly and the port channel is shown as up.

node2-bond1-switch2 (bond1 = lacp between eno1, eno2)
node2-bond2-switch1 (bond2 = lacp between enp7s0f0, enp7s0f1)
bond0 is HA bond1/bond2

Cisco Switch detects lacp properly and the port channel is shown as up.

switch1-switch2 are directly connected via port channel

In any case, i have set bondXX to down in both servers to be sure that nodes are directly connected to the same switch, but with no luck. Actually i have tested the four possible combinations, with no luck.
 
Actually you are right, i have done a more detailed check, and, taking one node as example, it seems that no matter what bonding becomes active, there are certain ip addresses in the network that are not reachable. Probably the ip addresses are located on the non primary switch. (i mean probably because the servers are also using active/backup mode, so i do not know for sure which is active where without accessing the servers)

I have shutdown bond1 first and bond2 second, but none of the unavailable ip addresses answer to pings.
 
First of all apologies for the mess. after some serious digging and going through the logs i have found out that the problem was in LACP not being properly configured in the switches, so cisco switches were seeing lacp as active, 2 ports per group while proxmox node was only seeing 1 port per group.


After reconfiguring the switches, everything is working as expected; every server has a active/pasive bonding using as slaves two lacp bonds, each of them to a different switch without issues.


@spirit the two switches are connected between them and each of them connected to upstream in a H/A mode, i.e only one switch is connected to upstream in active mode at any given time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!