VMs(CTs) cannot reach network over vmbr but host is fine

Madhatter

Renowned Member
Apr 8, 2012
38
2
73
Ich bin verzweifelt....
I have built a 3 node cluster PVE 8.4.1 (purely for low power, medium power and high power compute and to load VMS/CTs where needed)

node 1 (low power, bond0 - 2 interfaces) is from the beginning working without issues.
node 2 (medium power, bond 0 - 2 interfaces) once I installed the bond it worked but after a reboot the VMs/CTs running on it were not able to reach the network. "ping 192.168.1.1 - ping: connect: Network is unreachable"
In a video I've heard that reconfig bonding is somehow failing so I shall unconfig vmbr and bond0 and step by step reconfig that would work, as Proxmox seems to ignore bond settings if not. (And It worked eventually) https://youtu.be/zx5LFqyMPMU?t=659 for reference where I've heard that.

node 3 refuses anything like that. I'm unable to get any VM/CT onto the network. But the host itself is running and up
the CT with DHCP states ""ping 192.168.1.1 - ping: connect: Network is unreachable"
Static IP also doesn't connect.

my "ip a" loos like this
Code:
2: ens1f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 10:60:4b:72:da:84 brd ff:ff:ff:ff:ff:ff permaddr 90:e2:ba:6a:81:96
    altname enp7s0f0
3: ens1f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 10:60:4b:72:da:84 brd ff:ff:ff:ff:ff:ff permaddr 90:e2:ba:6a:81:97
    altname enp7s0f1
4: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
    link/ether 10:60:4b:72:da:84 brd ff:ff:ff:ff:ff:ff
    altname enp0s25
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether 10:60:4b:72:da:84 brd ff:ff:ff:ff:ff:ff
6: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 10:60:4b:72:da:84 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.11/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::1260:4bff:fe72:da84/64 scope link
       valid_lft forever preferred_lft forever

My interfaces
Code:
auto ens1f1
iface ens1f1 inet manual

auto ens1f0
iface ens1f0 inet manual

auto eno1
iface eno1 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves eno1 ens1f0 ens1f1
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
#3 way

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.11/24
    gateway 192.168.1.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
#bond0

my /proc/net/bonding/bond0

Code:
Ethernet Channel Bonding Driver: v6.11.11-2-pve

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP active: on
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 10:60:4b:72:da:84
Active Aggregator Info:
    Aggregator ID: 1
    Number of ports: 3
    Actor Key: 9
    Partner Key: 1001
    Partner Mac Address: 50:e0:39:f5:c6:fd

Slave Interface: eno1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 10:60:4b:72:da:84
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 10:60:4b:72:da:84
    port key: 9
    port priority: 255
    port number: 1
    port state: 61
details partner lacp pdu:
    system priority: 65535
    system mac address: 50:e0:39:f5:c6:fd
    oper key: 1001
    port priority: 1
    port number: 5
    port state: 63

Slave Interface: ens1f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 90:e2:ba:6a:81:96
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 10:60:4b:72:da:84
    port key: 9
    port priority: 255
    port number: 2
    port state: 61
details partner lacp pdu:
    system priority: 65535
    system mac address: 50:e0:39:f5:c6:fd
    oper key: 1001
    port priority: 1
    port number: 6
    port state: 63

Slave Interface: ens1f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 90:e2:ba:6a:81:97
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: 10:60:4b:72:da:84
    port key: 9
    port priority: 255
    port number: 3
    port state: 61
details partner lacp pdu:
    system priority: 65535
    system mac address: 50:e0:39:f5:c6:fd
    oper key: 1001
    port priority: 1
    port number: 7
    port state: 63

I've checked the routes, the VMs/CTs have vmb0 as device and the right gateway. Everything should work as designed. Nothing in the logs.
Host has the right routes too
# routel
Dst Gateway Prefsrc Protocol Scope Dev Table
default 192.168.1.1 kernel vmbr0

I'm totally out of ideas. Anyone with ideas, please ?
I've reinstalled the node 3 it worked for a day but after a reboot it went off ...