[SOLVED] So is OpenVSwitch bonding just broken on PVE 6? What's going on?

reckless

Well-Known Member
Feb 5, 2019
79
4
48
I had no problem using OVS to bond 2 Mellanox interfaces on PVE 5. Now I'm on 6 and I'm having a lot of issues, and I'm not the only one. Other recent threads are here:

https://forum.proxmox.com/threads/proxmox-6-network-wont-start.56362/
https://forum.proxmox.com/threads/pve-6-and-mellanox-4-x-drivers.56553/


I followed the tutorial but it's just not working for me. I have the latest Proxmox version as of today, with everything updated. I have 2 Mellanox interfaces I want to bond, on the same NIC: the Mellanox ConnectX-3 (MCX312A-XCBT, 2x SFP+ ports). Using ip a I get this:

Code:
4: enp132s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:02:c9:3b:61:10 brd ff:ff:ff:ff:ff:ff
5: enp132s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:02:c9:3b:61:11 brd ff:ff:ff:ff:ff:ff

These are the two interfaces I want to bond together: enp132s0 + enp132s0d1. This is currently my /etc/network/interfaces (I followed the proxmox OVS manual on the website):

Code:
allow-vmbr2 bond0
iface bond0 inet manual
    ovs_bonds enp132s0 enp132s0d1
    ovs_type OVSBond
    ovs_bridge vmbr2
    mtu 9000
    ovs_options bond_mode=balance-tcp other_config:lacp-time=fast lacp=active
    pre-up ( ifconfig enp132s0 mtu 9000 && ifconfig enp132s0d1 mtu 9000 )
# Force the MTU of the physical interfaces to be jumbo-frame capable.
# This doesn't mean that any OVSIntPorts must be jumbo-capable.
# We cannot, however set up definitions for eth0 and eth1 directly due
# to what appear to be bugs in the initialization process.

auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

iface enp132s0 inet manual

iface enp132s0d1 inet manual

auto vmbr0
iface vmbr0 inet static
    address  192.168.1.23
    netmask  255.255.255.0
    gateway  192.168.1.1
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0

allow-ovs vmbr2

auto vmbr2
iface vmbr2 inet manual
    ovs_type OVSBridge
    ovs_ports bond0
    mtu 9000
# NOTE: we MUST mention bond0, vlan50, and vlan55 even though each
#       of them lists ovs_bridge vmbr0!  Not sure why it needs this
#       kind of cross-referencing but it won't work without it!

Note that ifconfig is used while it's depreciated. Surely this can't be correct? What should I replace it with?

The above simply doesn't work. When I type dmesg | grep -i enp132s0 I get this output:
Code:
root@proxmox:~# dmesg | grep -i enp132s0
[   12.432376] mlx4_core 0000:84:00.0 enp132s0: renamed from eth0
[   12.473761] mlx4_core 0000:84:00.0 enp132s0d1: renamed from eth0
[   15.459747] mlx4_en: enp132s0: Link Up
[   15.564489] mlx4_en: enp132s0d1: Link Up
[ 1073.864057] mlx4_en: enp132s0: Link Down
[ 1082.575190] mlx4_en: enp132s0d1: Link Down
[ 1095.603476] mlx4_en: enp132s0: Link Up
[ 1099.024206] mlx4_en: enp132s0d1: Link Up
[ 1113.176462] mlx4_en: enp132s0: Link Down
[ 1116.346555] mlx4_en: enp132s0d1: Link Down
[ 1121.184069] mlx4_en: enp132s0: Link Up
[ 1122.601733] mlx4_en: enp132s0d1: Link Up


And from the syslog:

Code:
Sep 14 14:19:48 proxmox systemd[1]: Starting Open vSwitch...
Sep 14 14:19:48 proxmox zed[2215]: eid=2 class=config_sync pool_guid=0x713F0C2B1A688975
Sep 14 14:19:48 proxmox openvswitch-switch[2209]: ovsdb-server is already running.
Sep 14 14:19:48 proxmox openvswitch-switch[2209]: ovs-vswitchd is already running.
Sep 14 14:19:48 proxmox ovs-vsctl[2261]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=proxmox.vice
Sep 14 14:19:48 proxmox openvswitch-switch[2209]: Enabling remote OVSDB managers.
Sep 14 14:19:48 proxmox zed[2272]: eid=3 class=pool_import pool_guid=0x713F0C2B1A688975
Sep 14 14:19:48 proxmox ovs-vsctl[2277]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --may-exist add-br vmbr2 --
Sep 14 14:19:48 proxmox systemd-udevd[920]: Using default interface naming scheme 'v240'.
Sep 14 14:19:48 proxmox systemd-udevd[920]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 14 14:19:48 proxmox systemd-udevd[920]: Could not generate persistent MAC address for ovs-system: No such file or directory
Sep 14 14:19:48 proxmox kernel: device ovs-system entered promiscuous mode
Sep 14 14:19:48 proxmox kernel: netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.
Sep 14 14:19:48 proxmox zed[2320]: eid=4 class=history_event pool_guid=0x713F0C2B1A688975
Sep 14 14:19:48 proxmox zed[2343]: eid=5 class=config_sync pool_guid=0x713F0C2B1A688975
Sep 14 14:19:48 proxmox systemd-udevd[920]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 14 14:19:48 proxmox systemd-udevd[920]: Could not generate persistent MAC address for vmbr2: No such file or directory
Sep 14 14:19:48 proxmox kernel: netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.
Sep 14 14:19:48 proxmox kernel: device vmbr2 entered promiscuous mode
Sep 14 14:19:48 proxmox openvswitch-switch[2209]: /bin/sh: 1: ifconfig: not found
Sep 14 14:19:48 proxmox openvswitch-switch[2209]: ifup: failed to bring up bond0
Sep 14 14:19:48 proxmox systemd[1]: Started Open vSwitch.
Sep 14 14:19:48 proxmox systemd[1]: Starting Raise network interfaces...
Sep 14 14:19:48 proxmox systemd-udevd[920]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 14 14:19:48 proxmox systemd-udevd[920]: Could not generate persistent MAC address for vmbr0: No such file or directory


How do I get this to work? And why is ifconfig still used when it's depreciated, shouldn't the tutorial be updated?
 
Hi spirit, so like this?

Code:
allow-vmbr2 bond0
iface bond0 inet manual
    ovs_bonds enp132s0 enp132s0d1
    ovs_type OVSBond
    ovs_bridge vmbr2
    mtu 9000
    ovs_options bond_mode=balance-tcp other_config:lacp-time=fast lacp=active
    pre-up ( ifconfig enp132s0 mtu 9000 && ifconfig enp132s0d1 mtu 9000 )
# Force the MTU of the physical interfaces to be jumbo-frame capable.
# This doesn't mean that any OVSIntPorts must be jumbo-capable.
# We cannot, however set up definitions for eth0 and eth1 directly due
# to what appear to be bugs in the initialization process.

auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

iface enp132s0 inet manual

iface enp132s0d1 inet manual

auto vmbr0
iface vmbr0 inet static
    address  192.168.1.23
    netmask  255.255.255.0
    gateway  192.168.1.1
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0

allow-ovs vmbr2

iface vmbr2 inet manual
    ovs_type OVSBridge
    ovs_ports bond0
    mtu 9000
# NOTE: we MUST mention bond0, vlan50, and vlan55 even though each
#       of them lists ovs_bridge vmbr0!  Not sure why it needs this
#       kind of cross-referencing but it won't work without it!
 
@spirit I tried the above and it still doesn't work. This is the output of dmesg | grep -i mlx
Code:
root@machine:~# dmesg | grep -i mlx
[    4.953091] mlx4_core: Mellanox ConnectX core driver v4.0-0
[    4.953109] mlx4_core: Initializing 0000:84:00.0
[   12.173721] mlx4_core 0000:84:00.0: DMFS high rate steer mode is: disabled performance optimized steering
[   12.174069] mlx4_core 0000:84:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link)
[   12.422921] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
[   12.423137] mlx4_en 0000:84:00.0: Activating port:1
[   12.426692] mlx4_en: 0000:84:00.0: Port 1: Using 32 TX rings
[   12.426693] mlx4_en: 0000:84:00.0: Port 1: Using 16 RX rings
[   12.426990] mlx4_en: 0000:84:00.0: Port 1: Initializing port
[   12.427441] mlx4_en 0000:84:00.0: registered PHC clock
[   12.427907] mlx4_en 0000:84:00.0: Activating port:2
[   12.428508] mlx4_core 0000:84:00.0 enp132s0: renamed from eth0
[   12.429085] mlx4_en: 0000:84:00.0: Port 2: Using 32 TX rings
[   12.429086] mlx4_en: 0000:84:00.0: Port 2: Using 16 RX rings
[   12.429307] mlx4_en: 0000:84:00.0: Port 2: Initializing port
[   12.454384] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0
[   12.456181] mlx4_core 0000:84:00.0 enp132s0d1: renamed from eth0
[   12.457955] <mlx4_ib> mlx4_ib_add: counter index 2 for port 1 allocated 1
[   12.457956] <mlx4_ib> mlx4_ib_add: counter index 3 for port 2 allocated 1
[   15.412408] mlx4_en: enp132s0: Link Up
[   15.567100] mlx4_en: enp132s0d1: Link Up

That looks fine to me, but now from the syslog I see these entries:

Code:
Sep 15 13:26:42 proxmox ovs-ctl[2865]: Starting ovsdb-server.
Sep 15 13:26:42 proxmox ovs-vsctl[2920]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.16.1
Sep 15 13:26:42 proxmox ovs-vsctl[2926]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.10.1 "external-ids:system-id=\"28943f0e-ee58-4523-9f0a-9077beb02629\"" "external-ids:rundir=\"/var/run/openvswitch\"" "system-type=\"debian\"" "system-version=\"10\""
Sep 15 13:26:42 proxmox ovs-ctl[2865]: Configuring Open vSwitch system IDs.
Sep 15 13:26:42 proxmox ovs-ctl[2865]: Inserting openvswitch module.
Sep 15 13:26:42 proxmox kernel: openvswitch: Open vSwitch switching datapath
Sep 15 13:26:42 proxmox ovs-ctl[2865]: Starting ovs-vswitchd.
Sep 15 13:26:42 proxmox ovs-vsctl[2944]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=proxmox.machine
Sep 15 13:26:42 proxmox ovs-ctl[2865]: Enabling remote OVSDB managers.
Sep 15 13:26:42 proxmox systemd[1]: Started Open vSwitch Internal Unit.
Sep 15 13:26:42 proxmox systemd[1]: Reached target Network (Pre).
Sep 15 13:26:42 proxmox systemd[1]: Starting Open vSwitch...
Sep 15 13:26:42 proxmox systemd[1]: Started Proxmox VE Login Banner.
Sep 15 13:26:42 proxmox openvswitch-switch[2948]: ovsdb-server is already running.
Sep 15 13:26:42 proxmox openvswitch-switch[2948]: ovs-vswitchd is already running.
Sep 15 13:26:42 proxmox ovs-vsctl[2999]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=proxmox.machine
Sep 15 13:26:42 proxmox openvswitch-switch[2948]: Enabling remote OVSDB managers.
Sep 15 13:26:42 proxmox ovs-vsctl[3015]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=5 -- --may-exist add-br vmbr2 --
Sep 15 13:26:42 proxmox systemd-udevd[889]: Using default interface naming scheme 'v240'.
Sep 15 13:26:42 proxmox systemd-udevd[889]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 15 13:26:42 proxmox systemd-udevd[889]: Could not generate persistent MAC address for ovs-system: No such file or directory
Sep 15 13:26:42 proxmox kernel: device ovs-system entered promiscuous mode
Sep 15 13:26:42 proxmox kernel: netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.
Sep 15 13:26:42 proxmox systemd-udevd[889]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 15 13:26:42 proxmox systemd-udevd[889]: Could not generate persistent MAC address for vmbr2: No such file or directory
Sep 15 13:26:42 proxmox kernel: netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.
Sep 15 13:26:42 proxmox kernel: device vmbr2 entered promiscuous mode
Sep 15 13:26:43 proxmox openvswitch-switch[2948]: /bin/sh: 1: ifconfig: not found
Sep 15 13:26:43 proxmox openvswitch-switch[2948]: ifup: failed to bring up bond0
Sep 15 13:26:43 proxmox systemd[1]: Started Open vSwitch.
Sep 15 13:26:43 proxmox systemd[1]: Starting Raise network interfaces...
Sep 15 13:26:43 proxmox systemd-udevd[889]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 15 13:26:43 proxmox systemd-udevd[889]: Could not generate persistent MAC address for vmbr0: No such file or directory
Sep 15 13:26:43 proxmox kernel: vmbr0: port 1(eno1) entered blocking state
Sep 15 13:26:43 proxmox kernel: vmbr0: port 1(eno1) entered disabled state
Sep 15 13:26:43 proxmox kernel: device eno1 entered promiscuous mode
Sep 15 13:26:43 proxmox ifup[3179]: Waiting for vmbr0 to get ready (MAXWAIT is 2 seconds).
Sep 15 13:26:43 proxmox systemd-udevd[889]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 15 13:26:43 proxmox systemd-udevd[889]: Could not generate persistent MAC address for vmbr1: No such file or directory
Sep 15 13:26:43 proxmox kernel: vmbr1: port 1(eno2) entered blocking state
Sep 15 13:26:43 proxmox kernel: vmbr1: port 1(eno2) entered disabled state
Sep 15 13:26:43 proxmox kernel: device eno2 entered promiscuous mode
Sep 15 13:26:43 proxmox ifup[3179]: Waiting for vmbr1 to get ready (MAXWAIT is 2 seconds).
Sep 15 13:26:43 proxmox systemd[1]: Started Raise network interfaces.
Sep 15 13:26:43 proxmox systemd[1]: Reached target Network.
Sep 15 13:26:43 proxmox systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Sep 15 13:26:43 proxmox systemd[1]: Started LXC Container Monitoring Daemon.
Sep 15 13:26:43 proxmox systemd[1]: Starting OpenBSD Secure Shell server...
Sep 15 13:26:43 proxmox systemd[1]: Reached target Network is Online.
Sep 15 13:26:43 proxmox systemd[1]: Starting LXC network bridge setup...

Note this one in particular:

Code:
Sep 15 13:26:42 proxmox kernel: device vmbr2 entered promiscuous mode
Sep 15 13:26:43 proxmox openvswitch-switch[2948]: /bin/sh: 1: ifconfig: not found
Sep 15 13:26:43 proxmox openvswitch-switch[2948]: ifup: failed to bring up bond0

The ifconfig command is not in use anymore on modern Debian-based systems, including Proxmox. So why is that still used in the Proxmox tutorial?

How do I replace these line of codes (straight from the official Proxmox OVS documentation pages) with a working line:

ifconfig enp132s0 mtu 9000 && ifconfig enp132s0d1 mtu 9000

Any ideas? Perhaps the documentation needs an update...




 
Making progress now! I replaced this: ifconfig enp132s0 mtu 9000 && ifconfig enp132s0d1 mtu 9000
With this: ip link set mtu 9000 dev enp132s0 && ip link set mtu 9000 dev enp132s0d1

And that seems to have done the trick, together with spirit's edit of removing the auto vmbr2 line. This is how it looks currently:

Code:
4: enp132s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether 00:02:c9:3b:61:10 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::202:c9ff:fe3b:6110/64 scope link
       valid_lft forever preferred_lft forever
5: enp132s0d1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether 00:02:c9:3b:61:11 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::202:c9ff:fe3b:6111/64 scope link
       valid_lft forever preferred_lft forever
6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether e6:5a:2e:18:cd:ed brd ff:ff:ff:ff:ff:ff
7: vmbr2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 00:02:c9:3b:61:10 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::202:c9ff:fe3b:6110/64 scope link
       valid_lft forever preferred_lft forever
8: bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:08:50:1b:99:95 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::8:50ff:fe1b:9995/64 scope link
       valid_lft forever preferred_lft forever

I'm not sure why vmbr2 and bond0 both have the state as UNKNOWN, and I also don't know the ovs-system has the MTU set at 1500. Anybody know?

It does work currently, but I need to try and see if it's stable in the upcoming week.
 
  • Like
Reactions: Elie.SF
can you look inside

/etc/network/if-pre-up.d/openvswitch
and
/etc/network/if-post-down.d/openvswitch

and check if you have ifconfig reference inside them.
normally, you should not. (from a clean install of openvswitch package 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12 on proxmox6)
(if you have ifconfig reference, maybe something has been wrong during upgrade)
 
I think I do have it working now, it's been stable for the last few days. The key was to change the ifconfig command for a new ip based command as I listed above. Additionally, I removed the "auto ..." for the openvswitch interface (remove "auto vmbr2", and keep "allow-ovs vmbr2") just like spirit suggested.

Proxmox docs should be updated to reflect this.
 
The official docs still have outdated information listed: https://pve.proxmox.com/wiki/Open_vSwitch

It should really be updated to reflect the above. It has been working flawlessly for months now.

I have fixed the wiki.

Note that the wiki is not the official documentation, but the official doc don't have openvswitch info....
I'm currently looking to write a lot of network doc in some week
 
That would be helpful. I still see ifconfig being used in the wiki though, instead of ip, which is what should be used (and works for me).
 
Thanks Spirit. I only did it this way because the wiki more or less said this was the best way to do it. In fact, I still see the outdated info on the Proxmox wiki as of today. https://pve.proxmox.com/wiki/Open_vSwitch

You seem to be very knowledgable about OVS in general and I think it would be of great help to the community if you could add your wisdom to the wiki document and correct some of the outdated info on there, so others don't walk into the same issues I did.
 
Thanks Spirit. I only did it this way because the wiki more or less said this was the best way to do it. In fact, I still see the outdated info on the Proxmox wiki as of today. https://pve.proxmox.com/wiki/Open_vSwitch

You seem to be very knowledgable about OVS in general and I think it would be of great help to the community if you could add your wisdom to the wiki document and correct some of the outdated info on there, so others don't walk into the same issues I did.

Hi, thanks for the report.
I just haved updated the missing part with bond interfaces and mtu.

Note that I have recently send patch to add mtu support in gui, so it should be easier now to have a clean config.
 
  • Like
Reactions: reckless
Hi @spirit, I'm once again having issues with OVS and I'm hoping you could help again. I added an IP address to the OVS bridge, so that I could access Proxmox SSH through the bonded 10G Mellanox NIC as well, and hit 'Apply Configuration'. After that, all containers on this vmbr bridge were unreachable, so I set to reverse what I did: I rebooted with the previous /etc/network/interfaces file (I had made a backup so I just reversed my changes and rebooted). However, even after I reboot I cannot get the OVS bond to work, no matter what I try. It's quite puzzling because I used the exact same interfaces file that was working before, but now suddenly it doesn't work anymore. I can't even ping my LXC containers from the proxmox host as it will give the error message "Destination host unreachable".

So I followed your updated wiki guide to re-write the /etc/network/interfaces file based on what you wrote in the docs. Below is what I have. Do you see anything wrong? Because it still doesn't work after rebooting a bunch of times and I still can't even ping the LXC containers from the Proxmox host (always get "Destination host unreachable" errors).
Note that I have 2x regular 1GB ethernet ports (eno1 and eno2) and the other 2 ports are Mellanox ConnectX-3 10G SFP+ ports, that I have bonded.

/etc/network/interfaces:
Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!


auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

allow-vmbr2 enp132s0
iface enp132s0 inet manual
    ovs_mtu 9000

allow-vmbr2 enp132s0d1
iface enp132s0d1 inet manual
    ovs_mtu 9000

#auto bond0 # I have commented this out, even with it enabled it didn't work at boot.
allow-vmbr2 bond0
iface bond0 inet manual
    ovs_bridge vmbr2
    ovs_type OVSBond
    ovs_bonds enp132s0 enp132s0d1
    ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
    ovs_mtu 9000


auto vmbr0
iface vmbr0 inet static
    address  192.168.1.2
    netmask  255.255.255.0
    gateway  192.168.1.1
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0

allow-ovs vmbr2
iface vmbr2 inet manual
    ovs_type OVSBridge
    ovs_ports bond0
    ovs_mtu 9000
# NOTE: we MUST mention bond0, vlan50, and vlan55 even though each
#       of them lists ovs_bridge vmbr0!  Not sure why it needs this
#       kind of cross-referencing but it won't work without it!

auto vmbr1
iface vmbr1 inet manual
    bridge-ports eno2
    bridge-stp off
    bridge-fd 0


Here is the syslog on a fresh boot: https://pastebin.com/RAuwdeGQ

Here is the output of dmesg | grep mlx:

Code:
root@proxmox:~# dmesg | grep mlx
[    4.541611] mlx4_core: Mellanox ConnectX core driver v4.0-0
[    4.541630] mlx4_core: Initializing 0000:84:00.0
[   11.778316] mlx4_core 0000:84:00.0: DMFS high rate steer mode is: disabled performance optimized steering
[   11.778651] mlx4_core 0000:84:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link)
[   12.022635] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
[   12.022857] mlx4_en 0000:84:00.0: Activating port:1
[   12.026413] mlx4_en: 0000:84:00.0: Port 1: Using 32 TX rings
[   12.026414] mlx4_en: 0000:84:00.0: Port 1: Using 16 RX rings
[   12.026704] mlx4_en: 0000:84:00.0: Port 1: Initializing port
[   12.027162] mlx4_en 0000:84:00.0: registered PHC clock
[   12.027610] mlx4_en 0000:84:00.0: Activating port:2
[   12.028208] mlx4_core 0000:84:00.0 enp132s0: renamed from eth0
[   12.029041] mlx4_en: 0000:84:00.0: Port 2: Using 32 TX rings
[   12.029042] mlx4_en: 0000:84:00.0: Port 2: Using 16 RX rings
[   12.029260] mlx4_en: 0000:84:00.0: Port 2: Initializing port
[   12.069311] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0
[   12.070481] mlx4_core 0000:84:00.0 enp132s0d1: renamed from eth0
[   12.072683] <mlx4_ib> mlx4_ib_add: counter index 2 for port 1 allocated 1
[   12.072684] <mlx4_ib> mlx4_ib_add: counter index 3 for port 2 allocated 1
[   15.046908] mlx4_en: enp132s0: Link Up
[   15.156736] mlx4_en: enp132s0d1: Link Up

Here is the output of dmesg | grep vmbr:

Code:
root@proxmox:~# dmesg | grep vmbr
[   39.532577] device vmbr2 entered promiscuous mode
[   39.847372] vmbr0: port 1(eno1) entered blocking state
[   39.847373] vmbr0: port 1(eno1) entered disabled state
[   39.906386] vmbr1: port 1(eno2) entered blocking state
[   39.906387] vmbr1: port 1(eno2) entered disabled state
[   42.911695] vmbr0: port 1(eno1) entered blocking state
[   42.911697] vmbr0: port 1(eno1) entered forwarding state
[   42.911975] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr0: link becomes ready
[   43.043694] vmbr1: port 1(eno2) entered blocking state
[   43.043696] vmbr1: port 1(eno2) entered forwarding state
[   43.044019] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr1: link becomes ready
[   45.036285] vmbr0: port 2(veth100i0) entered blocking state
[   45.036286] vmbr0: port 2(veth100i0) entered disabled state
[   45.420279] vmbr0: port 2(veth100i0) entered blocking state
[   45.420280] vmbr0: port 2(veth100i0) entered forwarding state
[   49.037058] vmbr1: port 2(fwpr104p0) entered blocking state
[   49.037059] vmbr1: port 2(fwpr104p0) entered disabled state
[   49.037367] vmbr1: port 2(fwpr104p0) entered blocking state
[   49.037368] vmbr1: port 2(fwpr104p0) entered forwarding state
[   50.424690] vmbr0: port 3(fwpr105p0) entered blocking state
[   50.424693] vmbr0: port 3(fwpr105p0) entered disabled state
[   50.424944] vmbr0: port 3(fwpr105p0) entered blocking state
[   50.424945] vmbr0: port 3(fwpr105p0) entered forwarding state
[   59.665954] vmbr0: port 4(tap202i0) entered blocking state
[   59.665957] vmbr0: port 4(tap202i0) entered disabled state
[   59.666258] vmbr0: port 4(tap202i0) entered blocking state
[   59.666260] vmbr0: port 4(tap202i0) entered forwarding state

And here's a screenshot of what the network settings look like in the Proxmox GUI:
1591417900915.png

Output of ip a:
Code:
root@proxmox:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether ac:1f:6b:78:f7:8c brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr1 state UP group default qlen 1000
    link/ether ac:1f:6b:78:f7:8d brd ff:ff:ff:ff:ff:ff
4: enp132s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:02:c9:3b:61:10 brd ff:ff:ff:ff:ff:ff
5: enp132s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:02:c9:3b:61:11 brd ff:ff:ff:ff:ff:ff
6: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2e:4c:ee:a7:4a:c5 brd ff:ff:ff:ff:ff:ff
7: vmbr2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:10:10:ea:a6:46 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::10:10ff:feea:a646/64 scope link
       valid_lft forever preferred_lft forever
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ac:1f:6b:78:f7:8c brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.2/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::ae1f:6bff:fe78:f78c/64 scope link
       valid_lft forever preferred_lft forever
9: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ac:1f:6b:78:f7:8d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ae1f:6bff:fe78:f78d/64 scope link
       valid_lft forever preferred_lft forever

Code:
root@proxmox:~# cat /proc/net/bonding/bond0
cat: /proc/net/bonding/bond0: No such file or directory




Curious to hear your thoughts on how to fix this.
 
Last edited:
root@proxmox:~# cat /proc/net/bonding/bond0
cat: /proc/net/bonding/bond0: No such file or directory
it's an ovs bond, so it's not a real interface here,it's only exist in ovs.

you can verify with "ovs-vsctl show"

config seem to be ok.


Maybe:


I known that connect-x3 have problems with vlan-aware bridge with all vlans enabled by default. (linux vlan aware bridge enable vlans 2-4094 by default, buy connect-x3 only sur 128 vlans. It's related to vlan offloading in connect-x3 drivers, but I never found how to disable it).

I'm not sure if ovs have same behaviour
 
Why wouldn't this configuration work then? The individual MTU values per SFP+ port aren't even working as you can see (still at 1500). My previous interfaces config was working fine until today, I have no idea what is going wrong. I use VLANs on the overall network here but not on Proxmox.

Any other things I can try?

Output for ovs-vsctl show:

Code:
root@proxmox:~# ovs-vsctl show
de085860-217e-4b70-9526-f5245a31d550
    Bridge "vmbr2"
        Port "fwln103o0"
            Interface "fwln103o0"
                type: internal
        Port "fwln111o0"
            Interface "fwln111o0"
                type: internal
    ovs_version: "2.12.0"

I now rebooted into a Linux bridge setup, where it ALSO doesn't work. This is the interfaces file I have right now:

Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!


auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

iface enp132s0 inet manual
    mtu 9000

iface enp132s0d1 inet manual
    mtu 9000

auto bond0
iface bond0 inet manual
    slaves enp132s0 enp132s0d1
    bond_miimon 100
    bond_mode 802.3ad
    bond_xmit_hash_policy layer2+3
    mtu 9000
  
auto vmbr0
iface vmbr0 inet static
    address  192.168.1.2
    netmask  255.255.255.0
    gateway  192.168.1.1
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0

auto vmbr2
iface vmbr2 inet static
    address 192.168.1.21
    netmask 255.255.255.0
    bridge_ports bond0
    bridge_stp off
    bridge_fd 0
    mtu 9000


auto vmbr1
iface vmbr1 inet manual
    bridge-ports eno2
    bridge-stp off
    bridge-fd 0

Same result, no communication to containers behind vmbr2. When I do ifreload -a I get this error: error: vmbr2: bridge port bond0 does not exist.
And again, the MTU values are 1500 for both the SFP+ ports on the ConnectX-3.

EDIT: there seems to be an issue with MTU somewhere in here. With the above config (not using OVS anymore) and replacing all MTUs with 1500 instead of 9000, I get the connection to work again. No idea why....
 
Last edited:
Do I need to add auto enp132s0 and auto enp132s0d1 to both the Linux Bridge config AND the OVS config in order to get that MTU to properly stick?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!