Incorrect routes continue to be added for 1gbe/40gbe 3-node cluster

Metabolomics_guy

New Member
Sep 6, 2024
2
0
1
I recently built a three-node network with mellanox connectx-3 cards to serve as the cluster network and use another 1gbe card as the LAN and management network. I've tracked down the issue (I think) to an issue with the routes. Whenever i create a new CT or launch a new container (currently testing with only one CT running on all three nodes), the veth0 route gets a default route that blocks traffic to the LAN. Traffic from the container appears to function properly as I can reach the internet from the container but I cannot get access from the host and container at the same time.

An example is here:

Code:
0.0.0.0 dev veth100i0 scope link
0.0.0.0 dev fwln100i0 scope link
0.0.0.0 dev fwpr100p0 scope link
0.0.0.0 dev enp0s31f6 scope link
0.0.0.0 dev bond0 scope link
default dev veth100i0 scope link
default dev fwln100i0 scope link
default dev fwpr100p0 scope link
default dev enp0s31f6 scope link
default via 192.168.1.1 dev vmbr0 proto kernel onlink
10.15.15.0/24 dev bond0 proto kernel scope link src 10.15.15.50
169.254.0.0/16 dev enp0s31f6 proto kernel scope link src 169.254.227.93
169.254.0.0/16 dev bond0 proto kernel scope link src 169.254.108.23
169.254.0.0/16 dev fwpr100p0 proto kernel scope link src 169.254.172.218
169.254.0.0/16 dev fwln100i0 proto kernel scope link src 169.254.140.160
169.254.0.0/16 dev veth100i0 proto kernel scope link src 169.254.190.153
192.168.1.0/24 dev vmbr0 proto kernel scope link src 192.168.1.100

with the interfaces defined as follows:

Code:
auto lo
iface lo inet loopback

iface enp0s31f6 inet manual

auto enp1s0
iface enp1s0 inet manual
    mtu 9000

auto enp1s0d1
iface enp1s0d1 inet manual
    mtu 9000

iface bond0 inet static
    address 10.15.15.50/24
    bond-slaves enp1s0 enp1s0d1
    bond-miimon 100
    bond-mode broadcast
    mtu 9000
    metric 200

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.100/24
    gateway 192.168.1.1
    bridge-ports enp0s31f6
    bridge-stp off
    bridge-fd 0
    metric 0

source /etc/network/interfaces.d/*

I've tried using metrics to avoid this issue but with no luck. A
Code:
 systemctl restart networking
followed by
Code:
 ifup bond0
will provide network on the lan and cluster, but the moment a veth is created, the routes go back to preventing access to the LAN.

Any help would be greatly appreciated.
 
Realized the following may also be helpful:

An example /etc/hosts from the master node of the cluster:

Code:
root@master:~# more /etc/hosts
10.15.15.50 master
10.15.15.60 worker1
10.15.15.70 worker2

127.0.0.1 localhost.localdomain localhost
192.168.1.100 master.localdomain master

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

If relevant, my setup is currently behind an netgear orbi, will be moving to opnsense once the cluster is working properly as it will host it with HA.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!